Query grouped by two swap fields - mongodb

I have collection messages with the following documents
{
"_id" : ObjectId("5164218f359f109fd4000012"),
"receiver_id" : ObjectId("5164211e359f109fd4000004"),
"sender_id" : ObjectId("5162de8a359f10cbf700000c"),
"body" : "Hello Billy!!!",
"readed" : false,
"updated_at" : ISODate("2013-04-09T14:11:27.17Z"),
"created_at" : ISODate("2013-04-09T14:11:27.17Z")
}
I need to make query for receive last messages(don't matter recieved or sended) for a given user (grouped by reciever_id+sender_id fields) and sorted by created_at.
To better explain the question, an example of how I did it in SQL:
SELECT DISTINCT ON (sender_id+receiver_id) * FROM messages
ORDER by (sender_id+receiver_id), created_at DESC
WHERE sender_id = given_user or receiver_id = given_user
I don't understand how to solve this problem with mondodb.

The Aggregation Framework in MongoDB 2.2+ provides the most obvious translation of your query. The MongoDB manual includes an SQL to Aggregation Framework Mapping Chart as a general guide, although there are definite differences in the two approaches.
Here's a commented example you can try in the mongo shell:
var given_user = ObjectId("5162de8a359f10cbf700000c");
db.messages.aggregate(
// match: WHERE sender_id = given_user or receiver_id = given_user
// NB: do the match first, because it can take advantage of an available index
{ $match: {
$or:[
{ sender_id: given_user },
{ receiver_id: given_user },
]
}},
{ $group: {
// DISTINCT ON (sender_id+receiver_id)
_id: { sender_id: "$sender_id", receiver_id: "$receiver_id" }
}},
// ORDER by (sender_id+receiver_id), created_at DESC
{ $sort: {
sender_id: 1,
receiver_id: 1,
created_at: -1
}}
)
Sample result:
{
"result" : [
{
"_id" : {
"sender_id" : ObjectId("5162de8a359f10cbf700000c"),
"receiver_id" : ObjectId("5164211e359f109fd4000004")
}
}
],
"ok" : 1
}
You may want to add additional fields on the grouping, such as a count of messages received.
If you actually want to combine the sender_id+receiver_id into a single field, you can use the $concat operator in MongoDB 2.4+.

There is no explicit way to do so. Let's review workarounds:
Way 1:
do the distinct at code level (after find), then just use find:
db.message.find({$or:[{sender_id:?}, {receiver_id:?}]})
Way 2:Using aggregation framework :
db.message.aggregate( [
{$match: {$or:[{sender_id:?}, {receiver_id:?}]},
$group: { _id: {sender:"$sender_id", receiver:"$receiver_id"},
other: { ... } } },
$sort: {sender_id,receiver_id,...}
] )
This way problem appears at sort level since sender_id, receiver_id is not the same as sender_id+receiver_id
Way 3: Introduce the surrogate field sender_id+receiver_id then use find or even distinct per Stennie hint.

Related

Get record having highest date inside nested group in Mongodb

I am having a record set like below :
I need to write a query where foreach datatype of every parent I show the data type with highest date i.e
So far I am able to create two groups one on parent id & other on data type but i am unable to understand how to get record with max date.
Below is my query :
db.getCollection('Maintenance').aggregate( [{ $group :
{ _id :{ parentName: "$ParentID" , maintainancename : "$DataType" }}},
{ $group : {
_id : "$_id.parentName",
maintainancename: {
$push: {
term:"$_id.DataType"
}
}
}
}] )
You don't have to $group twice, try below aggregation query :
db.collection.aggregate([
/** group on two fields `ParentID` & `Datatype`,
* which will leave docs with unique `ParentID + Datatype`
* & use `$max` to get max value on `Date` field in unique set of docs */
{
$group: {
_id: {
parentName: "$ParentID",
maintainancename: "$Datatype"
},
"Date": { $max: "$Date" }
}
}
])
Test : mongoplayground
Note : After group stage you can use $project or $addFieldsstages to transform fields the way you want.

MongoDB Aggregation with DBRef

Is it possible to aggregate on data that is stored via DBRef?
Mongo 2.6
Let's say I have transaction data like:
{
_id : ObjectId(...),
user : DBRef("user", ObjectId(...)),
product : DBRef("product", ObjectId(...)),
source : DBRef("website", ObjectId(...)),
quantity : 3,
price : 40.95,
total_price : 122.85,
sold_at : ISODate("2015-07-08T09:09:40.262-0700")
}
The trick is "source" is polymorphic in nature - it could be different $ref values such as "webpage", "call_center", etc that also have different ObjectIds. For example DBRef("webpage", ObjectId("1")) and DBRef("webpage",ObjectId("2")) would be two different webpages where a transaction originated.
I would like to ultimately aggregate by source over a period of time (like a month):
db.coll.aggregate( { $match : { sold_at : { $gte : start, $lt : end } } },
{ $project : { source : 1, total_price : 1 } },
{ $group : {
_id : { "source.$ref" : "$source.$ref" },
count : { $sum : $total_price }
} } );
The trick is you get a path error trying to use a variable starting with $ either by trying to group by it or by trying to transform using expressions via project.
Any way to do this? Actually trying to push this data via aggregation to a subcollection to operate on it there. Trying to avoid a large cursor operation over millions of records to transform the data so I can aggregate it.
Mongo 4. Solved this issue in the following way:
Having this structure:
{
"_id" : LUUID("144e690f-9613-897c-9eab-913933bed9a7"),
"owner" : {
"$ref" : "person",
"$id" : NumberLong(10)
},
...
...
}
I needed to use "owner.$id" field. But because of "$" in the name of field, I was unable to use aggregation.
I transformed "owner.$id" -> "owner" using following snippet:
db.activities.find({}).aggregate([
{
$addFields: {
"owner": {
$arrayElemAt: [{ $objectToArray: "$owner" }, 1]
}
}
},
{
$addFields: {
"owner": "$owner.v"
}
},
{"$group" : {_id:"$owner", count:{$sum:1}}},
{$sort:{"count":-1}}
])
Detailed explanations here - https://dev.to/saurabh73/mongodb-using-aggregation-pipeline-to-extract-dbref-using-lookup-operator-4ekl
You cannot use DBRef values with the aggregation framework. Instead you need to use JavasScript processing of mapReduce in order to access the property naming that they use:
db.coll.mapReduce(
function() {
emit( this.source.$ref, this["total_price"] )
},
function(key,values) {
return Array.sum( values );
},
{
"query": { "sold_at": { "$gte": start, "$lt": end } },
"out": { "inline": 1 }
}
)
You really should not be using DBRef at all. The usage is basically deprecated now and if you feel you need some external referencing then you should be "manually referencing" this with your own code or implemented by some other library, with which you can do so in a much more supported way.

retrieving multiple (transformed) values in mongodb group matching a criterion from the group itself

I am finding my way through mongodb and have a collection that contains some documents of this shape:
{
"_id" : ObjectId("547a13b70dc5d228db81c475"),
"INSTRUMENT" : "InstrumentA",
"BID" : 5287,
"ASK" : 5290,
"TIMESTAMP" : ISODate("2014-10-01T23:57:27.137Z")
}
{
"_id" : ObjectId("547a0da20dc5d228db2f034d"),
"INSTRUMENT" : "InstrumentB",
"BID" : 0.88078,
"ASK" : 0.88098,
"TIMESTAMP" : ISODate("2014-10-01T23:58:59.637Z")
}
What I am looking to get is the last known mid (BID + ASK)/2 before a given ISODate for each INSTRUMENT. I got as far as getting the time of the last information across instruments and the last value of that last instrument. Even though the following looks like it works, the lastOccurance is being polluted across instruments.
db.runCommand(
{
group:
{
ns: 'collectionTest',
key : { INSTRUMENT : 1} ,
cond: { TIMESTAMP: { $lte: ISODate("2014-10-01 08:30:00") } } ,
$reduce: function( curr, result ) {
if(curr.TIMESTAMP > result.lastOccurance)
{
result.lastOccurance = curr.TIMESTAMP;
result.MID = (curr.BID + curr.ASK)/2;
result.INSTRUMENT = curr.INSTRUMENT;
}else
{
result.lastOccurance = null;
result.MID = null;
result.INSTRUMENT = null;
}
},
initial: { lastOccurance : ISODate("1900-01-01 00:00:00") }
}
}
)
If anybody can see a fix for this code, please let me know.
It's better to use aggregate instead of group whenever possible because it provides better performance and supports sharding.
With aggregate you can do this as:
db.test.aggregate([
// Only include the docs prior to the given date
{$match: {TIMESTAMP: { $lte: ISODate("2014-10-01 08:30:00") }}},
// Sort them in descending TIMESTAMP order
{$sort: {TIMESTAMP: -1}},
// Group them by INSTRUMENT, taking the first one in each group (which will be
// the last one before the given date) and computing the MID value for it.
{$group: {
_id: '$INSTRUMENT',
MID: {$first: {$divide: [{$add: ['$BID', '$ASK']}, 2]}},
lastOccurance : {$first: '$TIMESTAMP'}
}}
])

In Mongo, how do I only display documents with the highest value for a key that they share?

Say I have the following four documents in a collection called "Store":
{ item: 'chair', modelNum: 1154, votes: 75 }
{ item: 'chair', modelNum: 1152, votes: 16 }
{ item: 'table', modelNum: 1017, votes: 24 }
{ item: 'table', modelNum: 1097, votes: 52 }
I would like to find only the documents with the highest number of votes for each item type.
The result of this simple example would return modelNum: 1154 and modelNum: 1097. Showing me the most popular model of chair and table, based on the customer inputed vote score.
What is the best way write this query and sort them by vote in descending order? I'm developing using meteor, but I don't think that should have an impact.
Store.find({????}).sort({votes: -1});
You can use $first or $last aggregation operators to achieve what you want. These operators are only useful when $group follows $sort. An example using $first:
db.collection.aggregate([
// Sort by "item" ASC, "votes" DESC
{"$sort" : {item : 1, votes : -1}},
// Group by "item" and pick the first "modelNum" (which will have the highest votes)
{"$group" : {_id : "$item", modelNum : {"$first" : "$modelNum"}}}
])
Here's the output:
{
"result" : [
{
"_id" : "table",
"modelNum" : 1097
},
{
"_id" : "chair",
"modelNum" : 1154
}
],
"ok" : 1
}
If you are looking to do this in Meteor and on the client I would just use an each loop and basic find. Minimongo keeps the data in memory so I don't think additional find calls are expensive.
like this:
Template.itemsList.helpers({
items: function(){
var itemNames = Store.find({}, {fields: {item: 1}}).map(
function( item ) { return item.item; }
);
var itemsMostVotes = _.uniq( itemNames ).map(
function( item ) {
return Store.findOne({item: item}, {sort: {votes: -1}});
}
);
return itemsMostVotes;
}
});
I have switched to findOne so this returns an array of objects rather than a cursor as find would. If you really want the cursor then you could query minimongo with the _ids from itemMostVotes.
You could also use the underscore groupBy and sortBy functions to do this.
You would need to use the aggregation framework.
So
db.Store.aggregate(
{$group:{_id:"$item", "maxVotes": {$max:"$votes"}}}
);

How to count the number of documents on date field in MongoDB

Scenario: Consider, I have the following collection in the MongoDB:
{
"_id" : "CustomeID_3723",
"IsActive" : "Y",
"CreatedDateTime" : "2013-06-06T14:35:00Z"
}
Now I want to know the count of the created document on the particular day (say on 2013-03-04)
So, I am trying to find the solution using aggregation framework.
Information:
So far I have the following query built:
collection.aggregate([
{ $group: {
_id: '$CreatedDateTime'
}
},
{ $group: {
count: { _id: null, $sum: 1 }
}
},
{ $project: {
_id: 0,
"count" :"$count"
}
}
])
Issue: Now considering above query, its giving me the count. But not based on only date! Its taking time as well into consideration for unique count.
Question: Considering the field has ISO date, Can any one tell me how to count the documents based on only date (i.e excluding time)?
Replace your two groups with
{$project:{day:{$dayOfMonth:'$createdDateTime'},month:{$month:'$createdDateTime'},year:{$year:'$createdDateTime'}}},
{$group:{_id:{day:'$day',month:'$month',year:'$year'}, count: {$sum:1}}}
You can read more about the date operators here: http://docs.mongodb.org/manual/reference/aggregation/#date-operators