My documents look like this:
{
"_id": ObjectId("5698fcb5585b2de0120eba31"),
"id": "26125242313",
"parent_id": "26125241841",
"link_id": "10024080",
"name": "26125242313",
"author": "gigaquack",
"body": "blogging = creative writing",
"subreddit_id": "6",
"subreddit": "reddit.com",
"score": "27",
"created_utc": "2007-10-22 18:39:31"
}
What I'm trying to do is create a query that finds users who posted to only 1 subreddit. I did this in SQL by using the query:
Select distinct author, subreddit from reddit group by author having count(*) = 1;
I'm trying to do something similar in MongoDB but are having some troubles atm.
I managed to recreate select distinct by using aggregate group but I can't figure out how to solve the HAVING COUNT part.
This is what my query looks like:
db.collection.aggregate(
[{"$group":
{ "_id": { author: "$author", subreddit: "$subreddit" } } },
{$match:{count:1}} // This part is not working
])
Am I using $match wrong?
Your query should be like:
db.collection.aggregate([{
'$group': {
'_id': {'author': '$author', 'subreddit': '$subreddit'},
'count': {'$sum': 1},
'data': {'$addToSet': '$$ROOT'}}
}, {
'$match': {
'count': {'$eq': 1}
}}])
Where data is one-length list with matched document.
if you want to get some exact field, it should look like this:
db.collection.aggregate([{
'$group': {
'_id': {'author': '$author', 'subreddit': '$subreddit'},
'count': {'$sum': 1},
'author': {'$last': '$author'}}
}, {
'$match': {
'count': {'$eq': 1}
}}])
Related
1st collection
stocks = [
{"userId" : 1, "groupId": 1, "stockId": 1},
{"userId": 2, "groupId": 1, "stockId": 2},
{"userId": 3, "groupId": 4, "stockId": 3}
]
2nd collection:
items = [
{"userid": 1, "groupId": 1, "itemId": 1},
{"userid": 1, "groupId": 3, "itemId": 2},
{"userid": 1, "groupId": 4, "itemId": 3}
]
I have a collection user, from which i get the userid, here i have filtered to get userid as 1, I have tried the below lookup, i am getting all data for userid, but when I add condition to exclude group, its not working. can someone help or suggest where i am doing wrong?
{
from: "stocks",
localField: "user.id",
foreignField: "userId",
let: {group_id: "$groupId", user_id: "$userId" },
pipeline: [{ "$unionWith": { coll: "items", pipeline: [{$match: {$userid: "$$user_id", "$groupId": { $nin: ["$$group_id"]}}}]}}],
as: "stock_items"
}
I need list of data where userId and groupId should not be same, i need all the data from stocks and items excluding item[0], since both have same user id and group id.
I'm not entirely sure that I understand what you are asking for. I'm also confused by the sample aggregation that has been provided because fields like "user.id" doesn't exist in the sample documents. But at the end you specifically mention:
i need all the data from stocks and items excluding item[0], since both have same user id and group id
Based on that, this answer assumes that you are looking to find all documents in both collections where the value of the groupId field is different than the value of the userId field. If that is correct, then the following query should work:
db.stocks.aggregate([
{
$match: {
$expr: {
$ne: [
"$groupId",
"$userId"
]
}
}
},
{
"$unionWith": {
"coll": "items",
"pipeline": [
{
$match: {
$expr: {
$ne: [
"$groupId",
"$userId"
]
}
}
}
]
}
}
])
Playground demonstration here.
The way this operates is by using the $ne aggregation operator to compare the two fields in the document. We need to use the $expr operator to do this comparison as shown here in the documentation.
Note: Each collection contains 96.5k documents and each collection have these fields --
{
"name": "variable1",
"startTime": "variable2",
"endTime": "variable3",
"classes": "variable4",
"section": "variable"
}
I have 2 collections. I have to compare these 2 collection and have to find out whether some specific fields( here I want name, startTime, endTime) of the documents are same in both the collection.
My approach was to join these 2 collection and then use $lookup .. I also tried the following query but it didn't work.
Please help me.
col1.aggregate([
{
"$unionWith": {"col1": "col2"}
},
{
"$group":
{
"_id":
{
"Name": "$Name",
"startTime": "$startTime",
"endTime": "$endTime"
},
"count": {"$sum": 1},
"doc": {"$first": "$$ROOT"}
}
},
{
"$match": {"$expr": {"$gt": ["$count", 1]}}
},
{
"$replaceRoot": {"newRoot": "$doc"}
},
{
"$out": "newCollectionWithDuplicates"
}
])
You're approach is fine you just have a minor syntax error in your $unionWith, it's suppose to be like so:
{
"$unionWith": {
coll: "col2",
pipeline: []
}
}
Mongo Playground
I am having trouble understanding how to use aggregate pipelines in Mongo.
Given the list following documents:
db.dishes.insertMany([
{_id: "Vanilla Sundae", keywords: ["vanilla", "ice cream", "desert"] },
{_id: "Vanilla Cake", keywords: ["vanilla", "cake", "baking", "desert"] },
{_id: "Chocolate Cake", keywords: ["chocolate", "cake", "baking", "desert"] }
])
How do I create an aggregate that would return a list of distinct keywords and counts of docs by keywords:
[
{"_id": "vanilla", "count": 2},
{"_id": "ice cream", "count": 1},
{"_id": "desert", "count": 3},
{"_id": "baking", "count": 2},
{"_id": "cake", "count": 2},
{"_id": "chocolate", "count": 1}
]
You can use $unwind and $group to deconstruct and reconstruct the array
db.collection.aggregate([
{ $unwind: "$keywords" },
{
$group: {
_id: "$keywords",
count: { $sum: 1 }
}
}
])
Working Mongo playground
You can use unwind combined with group operator to achieve this.
db.collection.aggregate([ { "$unwind": { path: "$keywords" } }, { "$group": { "_id": "$keywords", "count": { $sum: 1 } } }, ])
This should do the trick! :)
I'm attaching the MongoDB playground here.
I have some documents structured like this:
{
"_id": Mongoid,
"relate_id": 1,
"userid": user1
},
{
"_id": Mongoid,
"relate_id": 2,
"userid": user2
},
{
"_id": Mongoid,
"relate_id": 1,
"userid": user3
}
My expected result is below:
{
"relate_id": 1
"userid": [user1, user3]
},
{
"relate_id": 2
"userid": [user2]
}
Can I search this structure using one aggregate() query?
Yes, you need to use the group aggregation stage, using the push operator, to populate your list of userid
db.collection.aggregate([
{$group: {_id: "$relate_id", userid: {"$push": "$userid"}}},
]
)
this my query for aggregate in pymongo:
db.connection_log.aggregate([
{ '$match': {
'login_time': {'$gte': datetime.datetime(2014, 5, 30, 6, 57)}
}},
{ '$group': {
'_id': {
'username': '$username',
'ras_id': '$ras_id',
'user_id': '$user_id'
},
'total': { '$sum': '$type_details.in_bytes'},
'total1': {'$sum': '$type_details.out_bytes'}
}},
{ '$sort': {'total': 1, 'total1': 1}}
])
How to count all result in aggregate?
Add to the end of your aggregation pipeline:
$group: {
_id:null,
count:{
$sum:1
}
}
SQL to Aggregation Mapping Chart
Well if you really want your results with a total count combined then you can always just push the results into their own array:
result = db.connection_log.aggregate([
{ '$match': {
'login_time': {'$gte': datetime.datetime(2014, 5, 30, 6, 57)}
}},
{ '$group': {
'_id': {
'username': '$username',
'ras_id': '$ras_id',
'user_id': '$user_id'
},
'total': { '$sum': '$type_details.in_bytes'},
'total1': {'$sum': '$type_details.out_bytes'}
}},
{ '$sort': {'total': 1, 'total1': 1}},
{ '$group' {
'_id': null,
'results': {
'$push': {
'_id': '$_id',
'total': '$total',
'total1': '$total1'
}
},
'count': { '$sum': 1 }
}}
])
And if you are using MongoDB 2.6 or greater you can just '$push': '$$ROOT' instead of actually specifying all of the document fields there.
But really, unless you are using MongoDB 2.6 and are explicitly asking for a cursor as a result, then that result is actually returned as an array already without adding an inner array for results with a count. So just get the length of the array, which in python is:
len(result)
If you are indeed using a cursor for a large result-set or otherwise using $limit and $skip to "page" results then you will need to do two queries with one just summarizing the "total count", but otherwise you just don't need to do this.