BSon Object Response Size problems [duplicate] - mongodb

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
MongoDB: count the number of items in an array
(3 answers)
Closed 4 years ago.
I'm trying to test MongoDB's performance (university research) inserting and querying same datas (with same queries) in different ways, but I'm having some troubles with response size, I'll try to explain what I'm doing.
My original file has this format:
[{"field":"aaaa","field2":"bbbbb","field3":"12345"},{"field":"cccc","field2":"ddddd","field3":"12345"},{"field":"ffff","field2":"ggggg","field3":"12345"},{"field":"hhhhh","field2":"iiiii","field3":"12345"},{"field":"jjjj","field2":"kkkkk","field3":"12345"},{"field":"lllll","field2":"mmmmm","field3":"12345"}]
1° Approach - I insert the whole file as a document in Mongo, but it doesn't accept it this way, so I have to add "Array" in front of the file, this way: {"Array":[{..},{..},{..},...]}, once inserted I query it with
db.collection.aggregate([
{ $match: {_cond_},
{ $unwind: "$Array"},
{ $match: {_cond_},
{ $group: {_id: null, count: {$sum:1}, Array: {$push: "$Array"}}},
{ $project: {"Numero HIT": "$count", Array:1}}
])
to retrieve inner file datas and count number of HITS. (_cond_ of course is something like "Array.field": "aaaa" or "Array.field": /something to search/).
2° Approach - I insert each inner document by itself: I split the original file (it's ALL in line) in an array, then i cycle it inserting each element. Then I query it with:
db.collection2.find({field: "aaaa"}) (or field: /something to search/)
I'm using two different collections, one for each approach, each of them of 207/208MB.
Everything seemed fine, then doing a query with 1° approach i got this error:
BSONObj size: 24002272 (0x16E3EE0) is invalid. Size must be between 0 and 16793600(16MB)
I remembered response from MongoDB's query MUST be lower then 16MB, ok, but how is it possible that approach 1 give me the error and the SAME* query in approach 2 doesn't say anything?? And how do I fix it?I mean: ok, the response is >16MB, how do I handle it? I can't do this kind of queries in any way? I hope it was clear what I meant.
Thanks in advance
*with SAME i mean something like:
1° Approach:
db.collection.aggregate([
{ $match: {"Array.field":"aaa", "Array.field3": 12345},
{ $unwind: "$Array"},
{ $match: {"Array.field":"aaa", "Array.field3": 12345},
{ $group: {_id: null, count: {$sum:1}, Array: {$push: "$Array"}}},
{ $project: {"Numero HIT": "$count", Array:1}}
])
2° Approach:
db.collection2.find({field: "aaa", field3: 12345})

Related

How do I calculate the average of multiple fields in a mongoDB document?

I have a collection with the following fields:
_id:612ff22c17286411a17252cf
WELL:"199-H4-15A"
TYPE:"E"
SYSTEM:"HX"
ID:"199-H4-15A_E_HX"
Jan-14:-168.8
Feb-14:-151
Mar-14:-164.1
Apr-14:-168.7
May-14:-172.6
Jun-14:-177.3
Jul-14:-177.6
Aug-14:-171.9
Sep-14:-138.9
Oct-14:-130.3
Nov-14:-163.8
Dec-14:-161.4
Jan-15:-168.7
Feb-15:-168.9
Mar-15:-168.6
Apr-15:-164.6
May-15:-141.7
Jun-15:-153.5
Jul-15:-163.7
Aug-15:-167.7
and I am trying to take the average of all of the month fields (e.g. all files like "Jan-14", "Feb-14" and so on). I was thinking of somehow pushing all of the month fields data into an array and then average the values but would like to avoid having to list all of the individual field names. Below is what I have so far.
[{$match: {
'WELL': '199-H4-15A'
}}, {$group: {
_id: null,
MonthAverageFlows: {
$push: {
$isNumber: ['$all']
}
}
}}, {$unwind: '$MonthAverageFlows'}, {$group: {
_id: null,
average: {
$avg: '$MonthAverageFlows.value'
}
}}]
All that comes out is ```null``. Any and all help would be appreciated. The raw data is in csv form:
WELL, TYPE, SYSTEM, ID, JAN-14, FEB-14, . . .
"199-H4-15A", "E", "HX", "199-H4-15A_E_HX", -168.8, -151, . . .
Using dynamic values as field name is generally considered as anti-pattern and you should avoid that. You are likely to introduce unnecessary difficulty to composing and maintaining your queries.
Nevertheless, you can do the followings in an aggregation pipeline:
$objectToArray to convert your raw document into k-v tuples array
$filter the array to get contain the monthly data only
$avg to calculate the sum
Here is the Mongo playground for your reference.

mongodb how to get a document which has max value of each "group with the same key" [duplicate]

This question already has answers here:
MongoDB - get documents with max attribute per group in a collection
(2 answers)
Closed 5 years ago.
I have a collection:
{'_id':'008','name':'ada','update':'1504501629','star':3.6,'desc':'ok', ...}
{'_id':'007','name':'bob','update':'1504501614','star':4.2,'desc':'gb', ...}
{'_id':'005','name':'ada','update':'1504501532','star':3.2,'desc':'ok', ...}
{'_id':'003','name':'bob','update':'1504501431','star':4.5,'desc':'bg', ...}
{'_id':'002','name':'ada','update':'1504501378','star':3.4,'desc':'no', ...}
{'_id':'001','name':'ada','update':'1504501325','star':3.6,'desc':'ok', ...}
{'_id':'000','name':'bob','update':'1504501268','star':4.3,'desc':'gg', ...}
...
if I want the result is, the max value of 'update' of the same 'name', means the newest document of 'name', get the whole document:
{'_id':'008','name':'ada','update':'1504501629','star':3.6,'desc':'ok', ...}
{'_id':'007','name':'bob','update':'1504501614','star':4.2,'desc':'gb', ...}
...
How to do it most effective?
I do it now in python is:
result=[]
for name in db.collection.distinct('name'):
result.append(db.collection.find({'name':name}).sort('update',-1)[0])
is it do 'find' too many times?
=====
I do this for crawl data with 'name', get many other keys, and every time I insert a document, I set a key named 'update'.
When I using the database, I want the newest document of specific 'name'. so it looks can not just use $group.
How should I do? re design the db structure or better way to find?
=====
Improved !
I've tried create index of 'name' & 'update', the process is shortened from half hour to 30 seconds!
But I still welcome for better solution ^_^
Your use case scenario suits real good for aggregation. As I see in your question you already know that but can't figure out how to use $group and take whole document that has the max update. If you $sort your documents before $groupyou can use $firstoperator. So no need to send a find query for each name.
db.collection.aggregate(
{ $sort: { "name": 1, "update": -1 } },
{ $group: { _id: "$name", "update": { $first: "$update" }, "doc_id": { $first: "$_id" } } }
)
I did not add an extra $projectoperation to aggregate, you can just add fields that you want in result to $groupwith $firstoperator.
Additionally, if you look closer to $sortoperation, you can see it uses your newly created index, so you did good to add that, otherwise I will recommend it too :)
Update: For your question in comment:
You should write all keys in $group. But if you think it will look bad or new fileds will come in future and does not want to rewrite $groupeach time, I would do that:
First get all _idfields of desired documents in aggregation and then get these documents in one findquery with $inoperator.
db.collection.find( { "_id": { $in: [<ids returned in aggregation] } } )

Meteor collection get last document of each selection

Currently I use the following find query to get the latest document of a certain ID
Conditions.find({
caveId: caveId
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
How can I use the same using multiple ids with $in for example
I tried it with the following query. The problem is that it will limit the documents to 1 for all the found caveIds. But it should set the limit for each different caveId.
Conditions.find({
caveId: {$in: caveIds}
},
{
sort: {diveDate:-1},
limit: 1,
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
One solution I came up with is using the aggregate functionality.
var conditionIds = Conditions.aggregate(
[
{"$match": { caveId: {"$in": caveIds}}},
{
$group:
{
_id: "$caveId",
conditionId: {$last: "$_id"},
diveDate: { $last: "$diveDate" }
}
}
]
).map(function(child) { return child.conditionId});
var conditions = Conditions.find({
_id: {$in: conditionIds}
},
{
fields: {caveId: 1, "visibility.visibility":1, diveDate: 1}
});
You don't want to use $in here as noted. You could solve this problem by looping through the caveIds and running the query on each caveId individually.
you're basically looking at a join query here: you need all caveIds and then lookup last for each.
This is a problem of database schema/denormalization in my opinion: (but this is only an opinion!):
You could as mentioned here, lookup all caveIds and then run the single query for each, every single time you need to look up last dives.
However I think you are much better off recording/updating the last dive inside your cave document, and then lookup all caveIds of interest pulling only the lastDive field.
That will give you immediately what you need, rather than going through expensive search/sort queries. This is at the expense of maintaining that field in the document, but it sounds like it should be fairly trivial as you only need to update the one field when a new event occurs.

Group By key Meteor Collection

Hello I've searched a lot before asking the question and still have not found any decent answer to my question.
I have a collection (it's copying from MSSQL table every x second) like this: https://ekhmoi.tinytake.com/sf/MzU2MTcwXzIwNDcxNTg
As you can see there are fields which has the same key (MessageId).
My goal is some kind of grouping them, by taking MessageId + Message(of each record which has the same MessageId) and finally i will insert it to new Collection.
so final result should look like this:
https://ekhmoi.tinytake.com/sf/MzU2MTc3XzIwNDcyMDY
Any idea how can i do this ?
You can use aggregation for grouping your collection data to get your final result and the process is actually very simple.
First of all run meteor add meteorhacks:aggregate and meteor add mikowals:batch-insert if you have not yet added these two packages.
Assuming CollectionA is the first collection and CollectionB is the second collection. Here is how I would group data from Collection A and write the final result in CollectionB:
let pipeline = [
{$project: {TraceId: 1, MessageId: 1, Message: 1}},
{$group: {
_id: "$MessageId",
Message: {$push: "$Message"},
TraceId: {$first: "$TraceId"}
}},
{$project: {
_id: 0,
MessageId: "$_id",
Message: 1,
TraceId: 1
}}
];
let groupedData = CollectionA.aggregate(pipeline);
CollectionB.batchInsert(groupedData);
Note that this example is just the representation of my idea so it may be not working if you copy paste directly to your code.

Result from "aggregate with unwind" is different from the "find with count"?

Here is a few documents from my collections:
{"make":"Lenovo", "model":"Thinkpad T430"},
{"make":"Lenovo", "model":"Thinkpad T430", "problems":["Battery"]},
{"make":"Lenovo", "model":"Thinkpad T430", "problems":["Battery","Brakes"]}
As you can see some documents have no problems, some have only one problem and some have few problems in a list.
I want to calculate how many reviews have a specific "problem" (like "Battery") in problems list.
I have tried to use the following aggregate command:
{ $match : { model : "Thinkpad T430"} },
{ $unwind : "$problems" },
{ $group: {
_id: '$problems',
count: { $sum: 1 }
}}
And for battery problem the count was 382. I also decided to double check this result with find() and count():
db.reviews.find({model:"Thinkpad T430",problems:"Battery"}).count()
Result was 362.
Why do I have this difference? And what is the right way to calculate it?
You likely have documents in the collection where problems contains more than one "Battery" string in the array.
When using $unwind, these will each result in their own doc, so the subsequent $group operation will count them separately.