allowDiskUse not working in pymongo - mongodb

I have data stored in MongoDB in the following format.
{
"_id" : ObjectId("570b487fb5360dd1e5ef840c"),
"internal_id" : 1,
"created_at" : ISODate("2015-07-14T10:08:38.994Z"),
"updated_at" : ISODate("2016-01-10T00:35:19.748Z"),
"ad_account_id" : 1,
"updated_time" : "2013-08-05T04:48:49-0700",
"created_time" : "2013-08-05T04:46:35-0700",
"name" : "Sale1",
"daily": [
{"clicks": 5000, "date": "2015-04-16"},
{"clicks": 5100, "date": "2015-04-17"},
{"clicks": 5030, "date": "2015-04-20"}
]
"custom_tags" : {
"Event" : {
"name" : "Clicks"
},
"Objective" : {
"name" : "Sale"
},
"Image" : {
"name" : "43c3fe7b262cde5f476ed303e472c65a"
},
"Goal" : {
"name" : "10"
},
"Type" : {
"name" : "None"
},
"Call To Action" : {
"name" : "None",
},
"Landing Pages" : {
"name" : "www.google.com",
}
}
I am trying to group individual documents by internal_id to find the aggregate sum of clicks from say 2015-04-15 to 2015-04-21 using the aggregate method.
In pymongo, when I try to do an aggregate using just $project on internal_id, I get the results, but when I try to $project custom_tags fields, I get the following error:
OperationFailure: Exceeded memory limit for $group, but didn't allow external sort.
Pass allowDiskUse:true to opt in.
Following the answer here, I even changed my aggregate function to list(collection._get_collection().aggregate(mongo_query["pipeline"], allowDiskUse=True)). But this still keeps throwing the earlier error.

Take a look at this link:
Can't get allowDiskUse:True to work with pymongo
This Works for me:
someSampleList= db.collectionName.aggregate(pipeline, allowDiskUse=True)
Where
pipeline = [
{'$sort': {'sortField': 1}},
{'$group': {'_id': '$distinctField'}},
{'$limit': 20000}]

Try with that:
list(collection._get_collection().aggregate(mongo_query["pipeline"], {allowDiskUse : true}))

Related

MongoDB - find document whose array length is less than or equal to 5

Can't we pass an object to $size operator in mongoose? Is there any ways to query on array for length so we can fetch document which contains an array of a particular length.
Hers is Sample Document
"_id" : ObjectId("5e8c9becd1257f66c4b8cd63"),
"index" : 0,
"name" : "Aurelia Gonzales",
"isActive" : false,
"registered" : ISODate("2015-02-11T09:52:39.000+05:30"),
"age" : 20,
"gender" : "female",
"eyeColor" : "green",
"favoriteFruit" : "banana",
"company" : {
"title" : "YURTURE",
"email" : "aureliagonzales#yurture.com",
"phone" : "+1 (940) 501-3963",
"location" : {
"country" : "USA",
"address" : "694 Hewes Street"
}
},
"tags" : [
"enim",
"id",
"velit",
"ad",
"consequat"
]
}
Here is query
db.admin.aggregate([
{
$match : {tags : {$size : {$lte : 5}}}
}
])
Here is Output
{
"message" : "$size needs a number",
"ok" : 0,
"code" : 2,
"codeName" : "BadValue",
"name" : "MongoError"
}
You can't use $size like that & needed to use aggregation $size operator to do this.
Query :
db.collection.find({
$expr: { /** Allows the use of aggregation expressions within the query language */
$lte: [
{
$size: "$tags"
},
5
]
}
})
Test : MongoDB-Playground
Although if the size of the array is important enough, it could be stored in the documents and indexed to fetch much faster results.
Following a similar logic a solution could be, two stage aggregation using $addFields and $size, $lte.
db.collection.aggregate([
{
$addFields: {
sizeOfTags: {
$size: "$tags"
}
}
},
{
$match: {
sizeOfTags: {
$lte: 5
}
}
}
])

How do I exclude a sepcific object from an array in mongodb?

I have a collection with the following structure:
{
"_id" : ObjectId("59ef54445134d7d70e1cf531"),
"CustomerId" : "Gym_2",
"History" : [
{
"Created_At" : ISODate("2017-10-24T14:54:59Z"),
"Unit" : 600,
"ReferenceCode" : "1cd15b4d-bc42-4a51-a8b3-307db6dc3dee",
},
{
"Created_At" : ISODate("2017-10-28T00:22:19Z"),
"Sent" : true
},
{
"Created_At" : ISODate("2017-10-29T10:22:23Z"),
"Unit" : 600,
"ReferenceCode" : "998e7fce-8a1c-4f7c-b48c-c02cb5c5ad5c",
}
]
}
{
"_id" : ObjectId("59ef54465134d7d70e1cf534"),
"CustomerId" : "Gym_1",
"History" : [
{
"Created_At" : ISODate("2017-10-24T14:55:02Z"),
"Unit" : 600,
"ReferenceCode" : "d19ebeec-bd81-4a0a-aed5-006f746b50ff",
},
{
"Unit" : 600,
"ReferenceCode" : "a991504f-be1f-4e77-b59f-fba73c59e6f1",
"Created_At" : ISODate("2017-10-26T13:51:14Z")
}
]
}
I'm trying to build a query that returns only CustomerId along with history objects that do not have the "Sent" field set.
The result should look like this:
{
"_id" : ObjectId("59ef54445134d7d70e1cf531"),
"CustomerId" : "Gym_2",
"History" : [
{
"Created_At" : ISODate("2017-10-24T14:54:59Z"),
"Unit" : 600,
"ReferenceCode" : "1cd15b4d-bc42-4a51-a8b3-307db6dc3dee",
},
{
"Created_At" : ISODate("2017-10-29T10:22:23Z"),
"Unit" : 600,
"ReferenceCode" : "998e7fce-8a1c-4f7c-b48c-c02cb5c5ad5c",
}
]
}
{
"_id" : ObjectId("59ef54465134d7d70e1cf534"),
"CustomerId" : "Gym_1",
"History" : [
{
"Created_At" : ISODate("2017-10-24T14:55:02Z"),
"Unit" : 600,
"ReferenceCode" : "d19ebeec-bd81-4a0a-aed5-006f746b50ff",
},
{
"Unit" : 600,
"ReferenceCode" : "a991504f-be1f-4e77-b59f-fba73c59e6f1",
"Created_At" : ISODate("2017-10-26T13:51:14Z")
}
]
}
The closest I could reach is the following query:
db.Customers.aggregate([
{$project:{"Sent":{$exists:false},count:{$size:"$History" }}}
]);
But I get "errmsg" : "Unrecognized expression '$exists'" .
How can I achieve this result?
There is a solution for your problem, and aggregation framework is definitely the right way to achieve what you want. To modify nested collection you have to:
Unwind this collection ($unwind)
Filter out documents where Sent exists
Group by common properties
Project your data to receive it's original form
db.customers.aggregate([
{$unwind: "$History"},
{$match: {"History.Sent": {$exists: false}}},
{$group: {"_id": { "_id": "$_id", "CustomerId": "$CustomerId" }, History: { $push: "$History"} }},
{$project: { "_id": "$_id._id", "CustomerId": "$_id.CustomerId", History: 1}}
]);
As you can see this query is rather complicated and for larger collections you might encounter problems with performance because we're doing a lot more than simple collection filtering. So although it works I'd suggest you should consider changing your data model, for instance having each history item as a separate document like this:
{
_id: "some_id"
"Created_At" : ISODate("2017-10-24T14:54:59Z"),
"CustomerId" : "Gym_2",
"Unit" : 600,
"Sent" : true //can be set or not
"ReferenceCode" : "1cd15b4d-bc42-4a51-a8b3-307db6dc3dee"
}
Then your query will be just simple find with $exists.

Embed root field in a subdocument within an aggregation pipeline

Maybe someone can help me with Mongo's Aggregation Pipeline. I am trying to put an object in another object but I'm new to Mongo and ist very difficult:
{
"_id" : ObjectId("5888a74f137ed66828367585"),
"name" : "Unis",
"tags" : [...],
"editable" : true,
"token" : "YfFzaoNvWPbvyUmSulXfMPq4a9QgGxN1ElIzAUmSJRX4cN7zCl",
"columns" : [...],
"description" : "...",
"sites" : {
"_id" : ObjectId("5888ae2f137ed668fb95a03d"),
"url" : "www.....de",
"column_values" : [
"University XXX",
"XXX",
"false"
],
"list_id" : ObjectId("5888a74f137ed66828367585")
},
"scan" : [
{
"_id" : ObjectId("5888b1074e2123c22ae7f4d3"),
"site_id" : ObjectId("5888ae2f137ed668fb95a03d"),
"scan_group_id" : ObjectId("5888a970a7f75fbd49052ed6"),
"date" : ISODate("2017-01-18T16:00:00Z"),
"score" : "B",
"https" : false,
"cookies" : 12
}
]
}
I want to put every object in the "scan"-array into "sites". So that it looks like this:
{
"_id" : ObjectId("5888a74f137ed66828367585"),
"name" : "Unis",
"tags" : [...],
"editable" : true,
"token" : "YfFzaoNvWPbvyUmSulXfMPq4a9QgGxN1ElIzAUmSJRX4cN7zCl",
"columns" : [...],
"description" : "...",
"sites" : {
"_id" : ObjectId("5888ae2f137ed668fb95a03d"),
"url" : "www.....de",
"column_values" : [
"University XXX",
"XXX",
"false"
],
"list_id" : ObjectId("5888a74f137ed66828367585"),
"scan" : [
{
"_id" : ObjectId("5888b1074e2123c22ae7f4d3"),
"site_id" : ObjectId("5888ae2f137ed668fb95a03d"),
"scan_group_id" : ObjectId("5888a970a7f75fbd49052ed6"),
"date" : ISODate("2017-01-18T16:00:00Z"),
"score" : "B",
"https" : false,
"cookies" : 12
}
]
}
}
Is there a step in the aggregation pipeline to perform this task?
With a single pipeline I don't see any other way but specifying each field individually as:
db.collection.aggregate([
{
"$project": {
"name": 1, "tags": 1,
"editable": 1,
"token": 1, "columns": 1,
"description": 1,
"sites._id": "$sites._id",
"sites.url": "$sites.url" ,
"sites.column_values": "$sites.column_values" ,
"sites.list_id": "$sites.list_id",
"sites.scan": "$scan"
}
}
])
With MongoDB 3.4 and newer, you can use the $addFields pipeline step instead of specifying all fields using $project. The advantage is that it adds new fields to documents and outputs documents that contain all existing fields from the input documents and the newly added fields:
db.collection.aggregate([
{
"$addFields": {
"sites._id": "$sites._id",
"sites.url": "$sites.url" ,
"sites.column_values": "$sites.column_values" ,
"sites.list_id": "$sites.list_id",
"sites.scan": "$scan"
}
}, { "$project": { "scan": 0 } }
])

Get matched embedded document(s) from array

I've got a lot of documents using the following structure in MongoDB:
{
"_id" : ObjectId("..."),
"plant" : "XY_4711",
"hour" : 1473321600,
"units" : [
{
"_id" : ObjectId("..."),
"unit_id" : 10951,
"values" : [
{
"quarter" : 1473321600,
"value" : 395,
},
{
"quarter" : 1473322500,
"value" : 402,
},
{
"quarter" : 1473323400,
"value" : 406,
},
{
"quarter" : 1473324300,
"value" : 410,
}
]
}
]
}
Now I need to find all embedded document values where the quarter is between some given timestamps (eg: { $gte: 1473324300, $lte: 1473328800 }).
I've only got the unit_id and the quarter timestamp from/to for filtering the documents. And I only need the quarter and value grouped and ordered by unit.
I'm new in MongoDB and read something about find() and aggregate(). But I don't know how to do it. MongoDB 3.0 is installed on the server.
Finally I've got it:
I simply have to take apart each array, filtering out the things I don't need and put it back together:
db.collection.aggregate([
{$match : {$and : [{"units.values.quarter" : {$gte : 1473324300}}, {"units.values.quarter" : {$lte : 1473328800 }}]}},
{$unwind: "$units"},
{$unwind: "$units.values"},
{$match : {$and : [{"units.values.quarter" : {$gte : 1473324300}}, {"units.values.quarter" : {$lte : 1473328800 }}]}},
{$project: {"units": {values: {quarter: 1, "value": 1}, unit_id: 1}}},
{$group: {"_id": "$units.unit_id", "quarter_values": {$push: "$units.values"}}} ,
{$sort: {"_id": 1}}
])
Will give:
{
"_id" : 10951,
"quarter_values" : [
{
"quarter" : 1473324300,
"value" : 410
},
{
"quarter" : 1473325200,
"value" : 412
},
{
"quarter" : 1473326100,
"value" : 412
},
{
"quarter" : 1473327000,
"value" : 411
},
{
"quarter" : 1473327900,
"value" : 408
},
{
"quarter" : 1473328800,
"value" : 403
}
]
}
See: Return only matched sub-document elements within a nested array for a detailed description!
I think I have to switch to $map or $filter in the future. Thanks to notionquest for supporting my questions :)
Please see the sample query below. I didn't exactly get your grouping requirement. However, with this sample query you should be able to change and get your desired output.
db.collection.aggregate([
{$unwind : {path : "$units"}},
{$match : {$and : [{"units.values.quarter" : {$gte : 1473324300}}, {"units.values.quarter" : {$lte : 1473328800 }}]}},
{$project : {"units" : {values : {quarter : 1, "value" : 1}, unit_id : 1}}},
{$group : { _id : "$units.unit_id", quarter_values : { $push :{ quarter : "$units.values.quarter", value : "$units.values.value"}}}},
{$sort : {_id : 1 }}
]);
Sample output:-
{
"_id" : 10951,
"quarter_values" : [
{
"quarter" : [
1473321600,
1473322500,
1473323400,
1473324300
],
"value" : [
395,
402,
406,
410
]
}
]
}

Conditional $inc in MongoDB query

I have a survey system with documents like this:
{
"_id" : ObjectId("555b0b33ed26911e080102c4"),
"question" : "survey",
"subtitle" : "",
"answers" : [
{
"title" : "option 1",
"color" : "#FFEC00",
"code" : "opt1",
"_id" : ObjectId("555b0b33ed26911e080102ce"),
"votes" : 0,
"visible" : true
},
{
"title" : "option 2",
"color" : "#0bb2ff",
"code" : "opt2",
"_id" : ObjectId("555b0b33ed26911e080102cd"),
"votes" : 0,
"visible" : true
}
]
}
Now, I'm working on submit vote, so I need to increase 'votes' field for an specific survey (depending on option selected by user).
My problem is: I can have multiple documents like that, so how can I $inc field votes inside this array for an specific document? I tried this query (based on this website), but it didn't work:
db.bigsurveys.update(
{_id: ObjectId('555b0b33ed26911e080102c4'), 'answers.$.code' : 'opt1'},
{ $inc : { 'answers.$.votes' : 1 } }
)
The main problem here is that I can have multiple documents like this. Thanks in advance!
Use $elemMatch and postional operator $ to update query as :
db.bigsurveys.update({
"_id": ObjectId("555b0b33ed26911e080102c4"),
"answers": {
"$elemMatch": {
"code": "opt1"
}
}
}, {
"$inc": {
"answers.$.votes": 1
}
})