MongoDB: Upserting and Sub documents - mongodb

Let's assume the following schema:
{
'_id' : 'star_wars',
'count' : 1234,
'spellings' : [
{ spelling: 'Star wars', total: 10},
{ spelling: 'Star Wars', total : 15},
{ spelling: 'sTaR WaRs', total : 5} ]
}
I can update the count and one of the spellings by doing this:
db.movies.update(
{_id: "star_wars",
'spellings.spelling' : "Star Wars" },
{ $inc :
{ 'spellings.$.total' : 1,
'count' : 1 }}
)
But this form of update doesn't work with upsert. i.e., if I try to update (with upsert) with an _id that doesn't exist, or with a spelling that doesn't already exist, nothing happens.
Is there a solution that allows me to upsert when updating ($inc) a sub-document?
Thanks!

You could change your schema a little, though. If your documents looked like this:
{
'_id' : 'star_wars',
'count' : 1234,
'spellings' :
{
'Star wars': 10,
'Star Wars': 15,
'sTaR WaRs': 5
}
}
Your updates would become as simple as:
db.movies.update({_id:"star_wars"},{$inc:{"spellings.Star Wars":1}},true)

Related

Mongodb multiple subdocument

I need a collection with structure like this:
{
"_id" : ObjectId("5ffc3e2df14de59d7347564d"),
"name" : "MyName",
"pays" : "de",
"actif" : 1,
"details" : {
"pt" : {
"title" : "MongoTime PT",
"availability_message" : "In stock",
"price" : 23,
"stock" : 1,
"delivery_location" : "Portugal",
"price_shipping" : 0,
"updated_date" : ISODate("2022-03-01T20:07:20.119Z"),
"priority" : false,
"missing" : 1,
},
"fr" : {
"title" : "MongoTime FR",
"availability_message" : "En stock",
"price" : 33,
"stock" : 1,
"delivery_location" : "France",
"price_shipping" : 0,
"updated_date" : ISODate("2022-03-01T20:07:20.119Z"),
"priority" : false,
"missing" : 1,
}
}
}
How can i create an index for each subdocument in 'details' ?
Or maybe it's better to do an array ?
Doing a query like this is currently very long (1 hour). How can I do ?
query = {"details.pt.missing": {"$in": [0, 1, 2, 3]}, "pays": 'de'}
db.find(query, {"_id": false, "name": true}, sort=[("details.pt.updated_date", 1)], limit=300)
An array type would be better, as there are advantages.
(1) You can include a new field which has values like pt, fr, xy, ab, etc. For example:
details: [
{ type: "pt", title : "MongoTime PT", missing: 1, other_fields: ... },
{ type: "fr", title : "MongoTime FR", missing: 1, other_fields: ... },
{ type: "xy", title : "MongoTime XY", missing: 2, other_fields: ... },
// ...
]
Note the introduction of the new field type (this can be any name representing the field data).
(2) You can also index on the array sub-document fields, which can improve query performance. Array field indexes are referred as Multikey Indexes.
The index can be on a field used in a query filter. For example, "details.missing". This key can also be part of a Compound Index. This can help a query filter like below:
{ pays: "de", "details.type": "pt", "details.missing": { $in: [ 0, 1, 2, 3 ] } }
NOTE: You can verify the usage of an index in a query by generating a Query Plan, applying the explain method on the find.
(3) Also, see Embedded Document Pattern as explained in the Model One-to-Many Relationships with Embedded Documents.

Use distinct() on a variable in mongodb

I'm trying several queries in mongodb. Each document of my colelction is like this :
{
"_id" : 1,
"name" : 1,
"isReferenceProteome" : 1,
"isRepresentativeProteome" : 1,
"component" : 1,
"reference" : 1,
"upid" : 1,
"modified" : 1,
"taxonomy" : 1,
"superregnum" : 1,
"description" : 1,
"dbReference" : 1
}
the "reference" field has nested fields, one is "authorList", an array containing 'name' fields.
"reference" {
"authorList" [
{"name": "author1"},
{"name": "author2"},
{"name": "author3"} ...etc...
]
}
I have stored in a variable the result of the following query :
var testing = db.mycollection.find({'reference.authorList.30': {$exists: true}})
which stores all documents where the authorList is at least 30 names long.
Then I wanted to use distinct() on this variable, in order to have the distinct names of all authors :
testing.distinct("reference.authorList.name")
I tried this way because my first query returned an empty array :
db.mycollection.distinct( "reference.authorList.name", {"reference.authorList.name.30": {$exists: true}} )
I'm also trying whit $where command, but I got syntaxError for now.
What I am missing ?
Thanks.
Use
db.head_human_prot.distinct( "reference.authorList.name", {"reference.authorList.30": {$exists: true}} )
instead of
db.head_human_prot.distinct( "reference.authorList.name", {"reference.authorList.name.30": {$exists: true}} )
Silly me...

Mongodb- using find() method on an Array of Objects only return first match instead of all

Unlike the other question someone asked where they wanted only one item returned. I HAVE one item returned and I need ALL of the matching objects in the array return. However the second object that matches my query is being completely ignored.
This is what one of the items in the item collection looks like:
{
name: "soda",
cost: .50,
inventory: [
{ flavor: "Grape",
amount: 8 },
{ flavor: "Orange",
amount: 4 },
{ flavor: "Root Beer",
amount: 15 }
]
}
Here is the query I typed in to mongo shell:
Items.find({"inventory.amount" : { $lte : 10} } , { name : 1, "inventory.$.flavor" : 1})
And here is the result:
"_id" : ObjectId("59dbe33094b70e0b5851724c"),
"name": "soda"
"inventory" : [
{ "flavor" : "Grape",
"amount" : 8,
}
]
And here is what I want it to return to me:
"_id" : ObjectId("59dbe33094b70e0b5851724c"),
"name": "soda"
"inventory" : [
{ "flavor" : "Grape",
"amount" : 8
},
{ "flavor" : "Orange",
"amount" : 4
}
]
I'm new to mongo and am dabbling to get familiar with it. I've read through the docs but couldn't find a solution to this though it's quite possible I overlooked it. I'd really love some help. Thanks in advance.
first u can get your result by this query
db.Items.find({"inventory.amount" : { $lte : 10} } , { name : 1, "inventory.flavor" : 1 , "inventory.amount" : 1})

MongoDB Aggregation Framework

I have a document that's structured as follows:
{
'_id' => 'Star Wars',
'count' => 1234,
'spelling' => [ ( 'Star wars' => 10, 'Star Wars' => 15, 'sTaR WaRs' => 5) ]
}
I would like to get the top N documents (by descending count), but with only one one spelling per document (the one with the highest value). It there a way to do this with the aggregation framework?
I can easily get the top 10 results (using $sort and $limit). But how do I get only one spelling per each?
So for example, if I have the following three records:
{
'_id' => 'star_wars',
'count' => 1234,
'spelling' => [ ( 'Star wars' => 10, 'Star Wars' => 15, 'sTaR WaRs' => 5) ]
}
{
'_id' => 'willow',
'count' => 2211,
'spelling' => [ ( 'willow' => 300, 'Willow' => 550) ]
}
{
'_id' => 'indiana_jones',
'count' => 12,
'spelling' => [ ( 'indiana Jones' => 10, 'Indiana Jones' => 25, 'indiana jones' => 5) ]
}
And I ask for the top 2 results, I'll get:
{
'_id' => 'willow',
'count' => 2211,
'spelling' => 'Willow'
}
{
'_id' => 'star_wars',
'count' => 1234,
'spelling' => 'Star Wars'
}
(or something to this effect)
Thanks!
Your schema as designed would make using anything but a MapReduce difficult as you've used the keys of the object as values. So, I adjusted your schema to better match with MongoDB's capabilities (in JSON format as well for this example):
{
'_id' : 'star_wars',
'count' : 1234,
'spellings' : [
{ spelling: 'Star wars', total: 10},
{ spelling: 'Star Wars', total : 15},
{ spelling: 'sTaR WaRs', total : 5} ]
}
Note that it's now an array of objects with a specific key name, spelling, and a value for the total (I didn't know what that number actually represented, so I've called it total in my examples).
On to the aggregation:
db.so.aggregate([
{ $unwind: '$spellings' },
{ $project: {
'spelling' : '$spellings.spelling',
'total': '$spellings.total',
'count': '$count'
}
},
{ $sort : { total : -1 } },
{ $group : { _id : '$_id',
count: { $first: '$count' },
largest : { $first : '$total' },
spelling : { $first: '$spelling' }
}
}
])
Unwind all of the data so the aggregation pipeline can access the various values of the array
Flatten the data to include the key aspects needed by the pipeline. In this case, the specific spelling, the total, and the count.
Sort on the total, so that the last grouping can use $first
Then, group so that only the $first value for each _id is returned, and then also return the count which because of the way it was flattened for the pipeline, each temporary document will contain the count field.
Results:
[
{
"_id" : "star_wars",
"count" : 1234,
"largest" : 15,
"spelling" : "Star Wars"
},
{
"_id" : "indiana_jones",
"count" : 12,
"largest" : 25,
"spelling" : "Indiana Jones"
},
{
"_id" : "willow",
"count" : 2211,
"largest" : 550,
"spelling" : "Willow"
}
]

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.