Remove all leafs from graph - mongodb

I've built a relations graph in a MongoDB collection, for example:
{ "user_id": 1, "follower_id": 2 }
{ "user_id": 1, "follower_id": 3 }
{ "user_id": 2, "follower_id": 1 }
{ "user_id": 2, "follower_id": 3 }
{ "user_id": 3, "follower_id": 4 }
{ "user_id": 5, "follower_id": 2 }
This represents a directed graph like this:
Is there an efficient way to remove "leafs" from the graph? In the example I'd like to remove node 4 from the graph, because that node only has one link with node 3 and remove node 5 because only node 2 links to it.
Or to say it with graph terminology: only keep vertices with indegree > 1 or outdegree > 1

Short answer would be no - there is no efficient way to do what you want with schema like this. It can be by iterating over all nodes, for example using aggregation framework, and removing nodes as separate operation but I think it is all what can be done. Assuming nodes are in graph collection it could be something like below but it is far from effective:
db.graph.aggregate(
{$project: {index: {$const: [0, 1]}, user_id: 1, follower_id: 1}},
{$unwind: "$index"},
{$project: {id: {$cond: [{$eq: ["$index", 0 ]}, "$user_id", "$follower_id"]} }},
{$group: {_id: "$id", count: {$sum: 1}}},
{$match: {count: {$lte: 1}}}
).result.forEach(function(node) { db.graph.remove({user_id: node._id});})
You could use more document-like schema if you want operations like this to be efficient.
{
user_id: 1,
follows: [2, 3],
followed_by: [2]
}

Related

MongoDB - How to use $bucketAuto aggregation where the buckets are grouped by another property

I need to create an aggregation pipeline that return price ranges for each product category.
What I need to avoid is to load all available categories and call the Database again, one by one with a $match on each category. There must be a better way to do it.
Product documents
{
Price: 500,
Category: 'A'
},
{
Price: 7500,
Category: 'A'
},
{
Price: 340,
Category: 'B'
},
{
Price: 60,
Category: 'B'
}
Now I could use a $group stage to group the prices into an array by their category.
{
_id: "$Category",
Prices: {
$addToSet: "$Price"
}
}
Which would result in
{
_id: 'A',
Prices: [500, 7500]
},
{
_id: 'B',
Prices: [340, 60]
}
But If I use $bucketAuto stage after this, I am unable to groupBy multiple properties. Meaning it would not take the categories into account.
I have tried the following
{
groupBy: "$Prices",
buckets: 5,
output: {
Count: { $sum: 1}
}
}
This does not take categories into account, but I need the generated buckets to be organised by category. Either having the category field within the _id as well or have it as another field and have 5 buckets for each distinct category:
{
_id: {min: 500, max: 7500, category: 'A'},
Count: 2
},
{
_id: {min: 60, max: 340, category: 'B'},
Count: 2
}...
Query1
if you want to group by category and find the max and min price for that category you can do it like this
Playmongo
aggregate(
[{"$group":
{"_id": "$Category",
"min-price": {"$min": "$Price"},
"max-price": {"$max": "$Price"}}}])
Query2
if you want to group by category and then apply the bucket inside the array of the prices, to create like 5 buckets like in your example
you can do it with a trick, that allows us to use stage operators to do operators inside the array
the trick is to have 1 extra collection with only 1 document [{}]
you do lookup, you unwind that array, you do what you want on it
here we unwind the array and do $bucketAuto on it, with 5 buckets, like in your example, this way we can have group by category, and the prices in 5 ranges (5 buckets)
Playmongo
aggregate(
[{"$group": {"_id": "$Category", "prices": {"$push": "$Price"}}},
{"$lookup":
{"from": "coll_with_1_empty_doc",
"pipeline":
[{"$set": {"prices": "$$prices"}}, {"$unwind": "$prices"},
{"$bucketAuto": {"groupBy": "$prices", "buckets": 5}}],
"as": "bucket-prices",
"let": {"prices": "$prices", "category": "$_id"}}}])
If none of the above works, if you can give sample documents and example output

Counting data per user with mongo aggregation framework

I have a collection, where each document contains user_ids as a property, which is an Array field. Example document(s) would be :
[{
_id: 'i3oi1u31o2yi12o3i1',
unique_prop: 33,
prop1: 'some string value',
prop2: 212,
user_ids: [1, 2, 3 ,4]
},
{
_id: 'i3oi1u88ffdfi12o3i1',
unique_prop: 34,
prop1: 'some string value',
prop2: 216,
user_ids: [2, 3 ,4]
},
{
_id: 'i3oi1u8834432ddsda12o3i1',
unique_prop: 35,
prop1: 'some string value',
prop2: 211,
user_ids: [2]
}]
My goal is to get number of documents per user, so sample output would be :
[
{user_id: 1, count: 1},
{user_id: 2, count: 3},
{user_id: 3, count: 2},
{user_id: 4, count: 2}
]
I've tried couple of things none of which worked, lastly I tried :
aggregate([
{ $group: {
_id: { unique_prop: "$unique_prop"},
users: { "$addToSet": "$user_ids" },
count: { "$sum": 1 }
}}
]
But it just returned the users per document. I m still trying to learn the any resource or advice would help.
You need to $unwind the "user_ids" array and in the $group stage count the number of time each "id" appears in the collection.
db.collection.aggregate([
{ "$unwind": "$user_ids" },
{ "$group": { "_id": "$user_ids", "count": {"$sum": 1 }}}
])
MongoDB aggregation performs computation on group of values from documents in a collection and return computed result through executing its stages in a pipeline.
According to above mentioned description please try executing following aggregate query in MongoDB shell.
db.collection.aggregate(
// Pipeline
[
// Stage 1
{
$unwind: "$user_ids"
},
// Stage 2
{
$group: {
_id:{user_id:'$user_ids'},
total:{$sum:1}
}
},
// Stage 3
{
$project: {
_id:0,
user_id:'$_id.user_id',
count:'$total'
}
},
]
);
In above aggregate query initially $unwind operator breaks an array field user_ids of each document into multiple documents for each element of array field and then it groups documents by value of user_ids field contained into each document and performs summation of documents for each value of user_ids field.

Mongodb aggregating likert scale

This seems like an easy question, but I can't seem to figure it out after trying for a substantial amount of time.
I have a mongodb collection that has the schema {user, documentID, rating}. Ratings are on a scale of 1-5, so the collection might look something like:
userA, documentA, 5
userA, documentB, 5
userB, documentA, 1
userC, documentB, 2
(and so on...)
Is there a way I can directly find the count of each rating on a single document with a single query? The desired output is something like:
documentA:{
"1": 23,
"2": 24,
"3": 131,
"4": 242,
"5": 500
}
I've read about how to use aggregate to group fields but I'm not sure how it can be used to return the count of distinct values (ie 1-5).
Will really appreciate any help provided!
you can achive this using aggregation
the query would look like
db.collection.aggregate([
{ $group:
{ _id: { document: "$document", rating: "$rating"},
sum: {$sum: 1}
}
}
])
the output would be like
{_id: {"document": "documentA", "rating": 1}, "sum": 1}
{_id: {"document": "documentA", "rating": 5}, "sum": 1}
{_id: {"document": "documentB", "rating": 2}, "sum": 1}
{_id: {"document": "documentB", "rating": 5}, "sum": 1}

How can I weight each MongoDB search query differently?

I have a MongoDB collection that looks something like this:
{
"name": "McAllister's Deli",
"menu": [
{"sandwich": 4},
{"spud": 3},
{"salad": 5},
{"cookie":2}
],
"reviews": 45
}
I would like to rank these restaurants based on the types of food they have and the number of reviews. For instance, if someone is looking for cookie and sandwich, McAllister's Deli would return a ranking of say 19.28 by taking (cookie * sandwich * reviews) / menuItems. Is there a way to optimize my query to take this ranking into account?
Edit: Since it was asked in a comment, I am currently using the Dart driver, but I am familiar with the Mongo shell and can translate a shell query to a query my driver understands.
db.rest.aggregate([{
$project: {
_id: 1,
sandwich: { $arrayElemAt: [ '$menu.sandwich', 0]},
cookie: { $arrayElemAt: [ '$menu.cookie', 0]},
menuItems: {$size: '$menu'}
}
},{
$project: {
_id: 1,
rank: { $divide: [{ $add: ['$sandwich', '$cookie']}, '$menuItems' ]}
}
}
])
probably can be combined. $project is separated for clearance.

MongoDB Aggregation: Combine two arrays

I have the following type of documents stored in a collection.
{
"_id" : "318036:2014010100",
"data": [
{"flow": [6, 10, 12], "occupancy": [0.0356, 0.06, 0.0856], time: 0},
{"flow": [2, 1, 4], "occupancy": [0.01, 0.0056, 0.0422], time: 30},
...
]
}
I want to compute an aggregated value from the first, second, ..., nth value in the flow and occupancy arrays. The order within the array should be preserved. Assuming I want compute the sum. The result should look like the following:
{
"_id" : "318036:2014010100",
"data": [
{"flow": [6, 10, 12], "occupancy": [0.0356, 0.06, 0.0856], sum: [6.0356, 10.006, 12.00856], time: 0},
{"flow": [2, 1, 4], "occupancy": [0.01, 0.0056, 0.0422], sum: [2.01, 1.0056, 4.0422], time: 30},
...
]
}
I tried to solve this by using the aggregation framework but my current approach does not preserve the ordering and produces to much sums.
db.sens.aggregate([
{$match: {"_id":/^318036:/}},
{$limit: 1},
{$unwind: "$data"},
{$unwind: "$data.flow"},
{$unwind: "$data.occupancy"},
{
$group: {
_id: {id: "$_id", time: "$data.time", o: "$data.occupancy", f: "$data.flow", s: {$add: ["$data.occupancy", "$data.flow"]}}
}
},
{
$group: {
_id: {id: "$_id.id", time: "$_id.time"}, occ: { $addToSet: "$_id.o"}, flow: {$addToSet: "$_id.f"}, speed: {$addToSet: "$_id.s"}
}
}
])
I am not sure if it is possible to solve this problem with the aggregation framework, so a solution using MapReduce would also be fine. How can I produce the desired result?
An alternative solution with neither aggregation framework nor map/reduce:
db.sens.find().forEach(function (doc) {
doc.data.forEach(function(dataElement) {
var sumArray = [];
for (var j = 0; j < dataElement.flow.length; j++) {
sumArray[j] = dataElement.flow[j] + dataElement.occupancy[j];
}
dataElement.sum = sumArray;
collection.save(doc);
});
});