I make extensive use of embedded documents in my MongoDB database and I'm running into speed problems when trying to add additional data:
As an example I have a document that looks a bit like this:
{
"date" : <<the date>>
"name" : "thisName"
"basket": [
{
"stock": "IBM",
"quantity": 1000.0,
"profit" : 10:0,
},
...
{
"stock": "MSFT",
"quantity": 2000.0,
"profit" : 30:0,
},
]
}
What I want to do is to add 5 new fields in the embedded documents like this:
{
"date" : <<the date>>
"name" : "thisName"
"basket": [
{
"stock": "IBM",
"quantity": 1000.0,
"profit" : 10:0,
"new_1" : 10:0,
"new_2" : 10:0,
"new_3" : 10:0,
"new_4" : 10:0,
"new_4" : 10:0,
"new_5" : 10:0
},
...
{
"stock": "MSFT",
"quantity": 2000.0,
"profit" : 30:0,
"new_1" : 10:0,
"new_2" : 10:0,
"new_3" : 10:0,
"new_4" : 10:0,
"new_4" : 10:0,
"new_5" : 10:0
},
]
}
I started doing this using find().update_one() in a for loop, identifying each embedded document explicitly and using document explicitly using "$set". This approach works but it is very slow. If my collection was small I'm sure that this wouldn't matter but as it is it's huge (100's of millions of documents). It's probably so slow because the entire document has to be moved every time I add a set of fields. With that in mind I attempted to add the new fields to all the embedded documents in one go. I did this by leaving the find query empty and removing the positional $ from the "$set" command. A little like this (in pymongo):
bulk.find({"date": dates[i],
"strategyId": strategyInfo[siOffset[l]][ID]
}).update({
"$set": {
"basket.new_1": 0.0,
"basket.new_2": 0.0,
"basket.new_3": 0.0,
"basket.new_4": 0.0,
"basket.new_5": 0.0
}
})
This approach seems to throw an error cannot use the part (basket of basket.new_5) to traverse the element ({basket:......
Is anyone able to give some insight as to what I'm doing wrong? Is it even possible to do this?
You can use a recursive function like this.
First find all that data for update
db.collection('game_users').find(
{"date": dates[i],"strategyId": strategyInfo[siOffset[l]][ID]}
).toArray(function(err, data) {
var i=0;
function data_Update(){
if(i!=data.length){
db.collection('game_users').update(
{"date": dates[i],"strategyId": strategyInfo[siOffset[l]][ID]},
{ $set : {
"basket.new_1": 0.0,
"basket.new_2": 0.0,
"basket.new_3": 0.0,
"basket.new_4": 0.0,
"basket.new_5": 0.0
}
},
function(err, resp) {
i++;
data_Update();
}
);
}
}
}
);`
Related
I need a collection with structure like this:
{
"_id" : ObjectId("5ffc3e2df14de59d7347564d"),
"name" : "MyName",
"pays" : "de",
"actif" : 1,
"details" : {
"pt" : {
"title" : "MongoTime PT",
"availability_message" : "In stock",
"price" : 23,
"stock" : 1,
"delivery_location" : "Portugal",
"price_shipping" : 0,
"updated_date" : ISODate("2022-03-01T20:07:20.119Z"),
"priority" : false,
"missing" : 1,
},
"fr" : {
"title" : "MongoTime FR",
"availability_message" : "En stock",
"price" : 33,
"stock" : 1,
"delivery_location" : "France",
"price_shipping" : 0,
"updated_date" : ISODate("2022-03-01T20:07:20.119Z"),
"priority" : false,
"missing" : 1,
}
}
}
How can i create an index for each subdocument in 'details' ?
Or maybe it's better to do an array ?
Doing a query like this is currently very long (1 hour). How can I do ?
query = {"details.pt.missing": {"$in": [0, 1, 2, 3]}, "pays": 'de'}
db.find(query, {"_id": false, "name": true}, sort=[("details.pt.updated_date", 1)], limit=300)
An array type would be better, as there are advantages.
(1) You can include a new field which has values like pt, fr, xy, ab, etc. For example:
details: [
{ type: "pt", title : "MongoTime PT", missing: 1, other_fields: ... },
{ type: "fr", title : "MongoTime FR", missing: 1, other_fields: ... },
{ type: "xy", title : "MongoTime XY", missing: 2, other_fields: ... },
// ...
]
Note the introduction of the new field type (this can be any name representing the field data).
(2) You can also index on the array sub-document fields, which can improve query performance. Array field indexes are referred as Multikey Indexes.
The index can be on a field used in a query filter. For example, "details.missing". This key can also be part of a Compound Index. This can help a query filter like below:
{ pays: "de", "details.type": "pt", "details.missing": { $in: [ 0, 1, 2, 3 ] } }
NOTE: You can verify the usage of an index in a query by generating a Query Plan, applying the explain method on the find.
(3) Also, see Embedded Document Pattern as explained in the Model One-to-Many Relationships with Embedded Documents.
I'm pretty brand new to Mongo and queries still, so that said, I'm trying to build a query that will find me results that match these three types of dog breeds and in addition to that, check for additional two specs. And finally, sort all by age. All the data comes from a csv file (scrnshot), there aren't any sub categories to any of the entries.
db.animals.find({
"animal_id" : 1,
"breed" : "Labrador Retriever Mix",
"breed" : "Chesapeake Bay Retriever",
"breed" : "Newfoundland",
$and : [ { "age_upon_outcome_in_weeks" :{"$lt" : 156, "$gte" : 26} ],
$and: {"sex_upon_outcome" : "Intact Female"}}).sort({"age_upon_outcome_in_weeks" : 1})
This is throwing a number of errors, such as :
Error: error: {
"ok" : 0,
"errmsg" : "$and must be an array",
"code" : 2,
"codeName" : "BadValue"
}
What am I messing up? Or is there a better way to do it?
As mentionend by takis in the comments, you cannot repeat a key in a mongo query - you have to imagine that your query document becomes a json object, and each time a key is repeated is replaces the previous one. To go around this problem, mongodb supports $or and $and operators. For complex queries like this one, I would recommend starting with a global each containing a single constraint or a $or constraint. Your query becomes this:
db.coll.find({
"$and": [
{ "animal_id": 1 },
{ "age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 } },
{ "sex_upon_outcome": "Intact Female" },
{ "$or": [
{ "breed": "Labrador Retriever Mix" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Chesapeake Bay Retriever" },
{ "breed": "Newfoundland" }
]
}
]
})
.sort({"age_upon_outcome_in_weeks" : 1})
--- edit
You can also consider using the $in instead of the $or:
db.coll.find({
"animal_id": 1,
"age_upon_outcome_in_weeks": { "$lt": 156, "$gte": 26 },
"sex_upon_outcome": "Intact Female",
"breed": { "$in": [
"Labrador Retriever Mix",
"Chesapeake Bay Retriever",
"Chesapeake Bay Retriever",
"Newfoundland"
] }
})
.sort({"age_upon_outcome_in_weeks" : 1})
I am using an aggregation pipeline with the MongoDB Java driver version 3.6. If I have documents that look something like:
doc1 --
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
doc2 --
{
"CAR": {
"VIN": "ASDF1234",
"AVAILABILITY": "In Stock"
}
}
And if I submit a query like:
collection.aggregate(
Arrays.asList(
Aggregates.match(
and(
in("CAR.VIN", vinList),
or(
eq("CAR.MAKE", carMake),
eq("CAR.AVAILABILITY", carAvailability),
)
)
)
)
)
Let us assume that there are exactly two different records for which the "CAR.VIN" criteria match for every VIN, and I am going to get two results. Rather than deal with two results each time, I would like to merge the documents so that the result looks like this:
{
"CAR": {
"VIN": "ASDF1234",
"YEAR": "2018",
"MAKE": "Honda",
"MODEL": "Accord",
"AVAILABILITY": "In Stock"
},
"FEATURES": [
{
"AUDIO": "MP3",
"TIRES": "All Season",
"BRAKES": "ABS"
}
]
}
The example where I have two and only two results trivializes my need for this. Imagine that vinList is a list of 10000 values, and it might return 2 x 10000 documents. When I return an AggregateIterable to the client that is calling my code, I do not want to impose the requirement that they have to group or collate the results in any way, but that they will receive one document for each result that has all of the information that they will want to parse, cleanly and easily.
Of course, people will suggest that the data is simply combined into one document with all of the data in the MongoDB collection. For reasons that I cannot control, there are two separate documents corresponding to each VIN in the same collection, and that is something that I am unable to change. There is a value in our system that makes this more reasonable than it might seem, so please don't focus on this apparent problem with the data.
I am trying, with not much luck, to utilize the Aggretes.group() operation to merge the fields in my aggregation pipeline. Accumulators.push seems to be the closest operation to what I need, but I do not want to complicate the document structure with extra arrays, etc. Is there a straightforward approach that I am not seeing?
you can try $mergeObjects added in mongo v3.6
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$mergeObjects : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : {
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
}
}
>
to get features as array
db.cc.aggregate(
[
{
$group: {
_id : "$CAR.VIN",
CAR : {$mergeObjects : "$CAR"},
FEATURES : {$push : {$arrayElemAt : ["$FEATURES", 0 ]}}
}
}
]
).pretty()
result
{
"_id" : "ASDF1234",
"CAR" : {
"VIN" : "ASDF1234",
"YEAR" : "2018",
"MAKE" : "Honda",
"MODEL" : "Accord",
"AVAILABILITY" : "In Stock"
},
"FEATURES" : [
{
"AUDIO" : "MP3",
"TIRES" : "All Season",
"BRAKES" : "ABS"
},
null
]
}
>
I have a mongodb database with currently about 30 collections ranging from 1.5gb to 2.5gb and I need to reformat and sort the data into nested groups and dump them to a new collection. This database will eventually have about 2000 collections of the same type and formatting of data.
Data is currently available like this:
{
"_id" : ObjectId("598392d6bab47ec75fd6aea6"),
"orderid" : NumberLong("4379116282"),
"regionid" : 10000068,
"systemid" : 30045305,
"stationid" : 60015036,
"typeid" : 7489,
"bid" : 0,
"price" : 119999.91,
"minvolume" : 1,
"volremain" : 6,
"volenter" : 8,
"issued" : "2015-12-31 09:12:29",
"duration" : "14 days, 0:00:00",
"range" : 65535,
"reportedby" : 0,
"reportedtime" : "2016-01-01 00:22:42.997926"} {...} {...}
I need to group these by regionid > typeid > bid like this:
{"regionid": 10000176,
"orders": [
{
"typeid": 34,
"buy": [document, document, document, ...],
"sell": [document, document, document, ...]
},
{
"typeid": 714,
"buy": [document, document, document, ...],
"sell": [document, document, document, ...]
}]
}
Here's more verbose a sample of my ideal output format: https://gist.github.com/BatBrain/cd3426c29ce8ca8152efd1fa06ca1392
I have been trying to use the db.collection.aggregate() to do this, running this command as an initial test step:
db.day_2016_01_01.aggregate( [{ $group : { _id : "$regionid", entries : { $push: "$$ROOT" } } },{ $out : "test_group" }], { allowDiskUse:true, cursor:{} })
But I have been getting this message, "errmsg" : "BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit."
I tried looking into how to use the cursor object, but I'm pretty confused about how to apply it in this situation, or even if that is a viable option. Any advice or solutions would be great.
I have a document per day per meter. How can I add another subdocument in the data array and create the whole document if he doesn't exists ?
{
"key": "20120418_123456789",
"data":[
{
"Meter": 123456789,
"Dt": ISODate("2011-12-29T16:00:00.0Z"),
"Energy": 25,
"PMin": 11,
"PMax": 16
}
],
"config": {"someparam": 4.5}
}
Can I use upsert for that purpose ?
The result will be if document exists :
{
"key": "20120418_123456789",
"data":[
{
"Meter": 123456789,
"Dt": ISODate("2011-12-29T16:00:00.0Z"),
"Energy": 25,
"PMin": 11,
"PMax": 16
},
{
"Meter": 123456789,
"Dt": ISODate("2011-12-29T16:15:00.0Z"),
"Energy": 22,
"PMin": 13,
"PMax": 17
}
],
"config": {"someparam": 4.5}
}
Thanks in advance
I think what you want is the $addToSet command - that will push an element to an array only if it does not already exist. I've simplified your example a bit for brevity:
db.meters.findOne()
{
"_id" : ObjectId("4f8e95a718bc9c7da1e6511a"),
"config" : {
"someparam" : 4.5
},
"data" : [
{
"Meter" : 123456789,
}
],
"key" : "20120418_123456789"
}
Now run:
db.meters.update({"key" : "20120418_123456789"}, {"$addToSet": {"data" : {"Meter" : 1234}}})
And we get the updated version:
db.meters.findOne()
{
"_id" : ObjectId("4f8e95a718bc9c7da1e6511a"),
"config" : {
"someparam" : 4.5
},
"data" : [
{
"Meter" : 123456789,
},
{
"Meter" : 1234
}
],
"key" : "20120418_123456789"
}
Run the same command again and the result is unchanged.
Note: you are likely going to be growing these documents, especially if this field is unbounded and causing frequent (relatively expensive) moves by updating in this way - you should have a look here for ideas on how to mitigate this:
http://www.mongodb.org/display/DOCS/Padding+Factor#PaddingFactor-ManualPadding