Mongodb: Querying through multiple relatively unknown keys for a value - mongodb

Right now, we are using mongodb 1.2.2 to create a database and store values. Our data types look like this:
"file" : "1" , "tools": { "foo": { "status": "pending"} }
"file" : "2" , "tools": { "bar": { "status": "pending" } }
"file" : "3" , "tools": { "foo": { "status": "running" } }
"file" : "4" , "tools": { "bar": { "status": "done" } }
"file" : "5" , "tools": { "foo": { "status": "done" } }
We want to query for every single one that has { "status" : "pending" }. We do not want to use {"tools.foo.status" : "pending"} because we will have many different variations other than foo and bar. To make it more clear we want to do something like this {"tools.*.status" : "pending"}

No, you can't do that. I'm afraid you'll have to maintain your own index for this. That is, for every insert/update to the files collection, do an upsert to the file_status_index collection to update current status.
Querying is also a two-step process: first query the index collection to get the ids, and then issue $in query to the files collection to get actual data.
This may sound scary, but that's a price you have to pay with this schema.

Firstly, you should upgrade your MongoDB. 1.2.2 is really an old version.
Secondly, you cannot do query you ask. You can do this with the Map/Reduce.

I think it's time to ask why you're storing things the way you are.
There is no efficient way to search this kind of structure; since there is no known keys-only path to get to the value you're filtering on, every single record needs to be expanded every single time, and that's very expensive, especially once your collection no longer fits in RAM.
IMO, you'd be better off with a secondary collection to hold these statuses. Yes, it makes your datastore more relational, but that's because your data is relational.
file_tools:
{ 'file_id' : 1, 'name' : 'foo', 'status' : 'pending' }
{ 'file_id' : 2, 'name' : 'bar', 'status' : 'pending' }
{ 'file_id' : 3, 'name' : 'foo', 'status' : 'running' }
{ 'file_id' : 4, 'name' : 'foo', 'status' : 'done' }
{ 'file_id' : 5, 'name' : 'foo', 'status' : 'done' }
files:
{ 'id': 1 }
{ 'id': 2 }
{ 'id': 3 }
{ 'id': 4 }
{ 'id': 5 }
> // find out which files have pending tools
> files_with_pending_tools = file_tools.find( { 'status' : 'pending' }, { 'file_id' : 1 } )
> //=> [ { 'file_id' : 1 }, { 'file_id' : 2 } ]
>
> // just get the ids
> file_ids_with_pending_tools = files_with_pending_tools.map( function( file_tool ){
> file_tool['file_id']
> })
> //=> [1,2]
>
> // query the files
> files.find({'id': { $in : file_ids_with_pending_tools }})
> //=> [ { 'id' : 1 }, { 'id' : 2 } ]

Related

Retrieving value of an emedded object in mongo

Followup Question
Thanks #4J41 for your spot on resolution. Along the same lines, I'd also like to validate one other thing.
I have a mongo document that contains an array of Strings, and I need to convert this particular array of strings into an array of object containing a key-value pair. Below is my curent appraoch to it.
Mongo Record:
Same mongo record in my initial question below.
Current Query:
templateAttributes.find({platform:"V1"}).map(function(c){
//instantiate a new array
var optionsArray = [];
for (var i=0;i< c['available']['Community']['attributes']['type']['values'].length; i++){
optionsArray[i] = {}; // creates a new object
optionsArray[i].label = c['available']['Community']['attributes']['type']['values'][i];
optionsArray[i].value = c['available']['Community']['attributes']['type']['values'][i];
}
return optionsArray;
})[0];
Result:
[{label:"well-known", value:"well-known"},
{label:"simple", value:"simple"},
{label:"complex", value:"complex"}]
Is my approach efficient enough, or is there a way to optimize the above query to get the same desired result?
Initial Question
I have a mongo document like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"platform" : "A1",
"available" : {
"Community" : {
"attributes" : {
"type" : {
"values" : [
"well-known",
"simple",
"complex"
],
"defaultValue" : "well-known"
},
[......]
}
I'm trying to query the DB and retrieve only the value of defaultValue field.
I tried:
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
as well as
db.templateAttributes.findOne(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
)
But they both seem to retrieve the entire object hirarchy like below:
{
"_id" : ObjectId("57e3720836e36f63695a2ef2"),
"available" : {
"Community" : {
"attributes" : {
"type" : {
"defaultValue" : "well-known"
}
}
}
}
}
The only way I could get it to work was with find and map function, but it seems to be convoluted a bit.
Does anyone have a simpler way to get this result?
db.templateAttributes.find(
{ platform: "A1" },
{ "available.Community.attributes.type.defaultValue": 1 }
).map(function(c){
return c['available']['Community']['attributes']['type']['defaultValue']
})[0]
Output
well-known
You could try the following.
Using find:
db.templateAttributes.find({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 }).toArray()[0]['available']['Community']['attributes']['type']['defaultValue']
Using findOne:
db.templateAttributes.findOne({ platform: "A1" }, { "available.Community.attributes.type.defaultValue": 1 })['available']['Community']['attributes']['type']['defaultValue']
Using aggregation:
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}},
{"$project": {_id:0, default:"$available.Community.attributes.type.defaultValue"}}
]).toArray()[0].default
Output:
well-known
Edit: Answering the updated question: Please use aggregation here.
db.templateAttributes.aggregate([
{"$match":{platform:"A1"}}, {"$unwind": "$available.Community.attributes.type.values"},
{$group: {"_id": null, "val":{"$push":{label:"$available.Community.attributes.type.values",
value:"$available.Community.attributes.type.values"}}}}
]).toArray()[0].val
Output:
[
{
"label" : "well-known",
"value" : "well-known"
},
{
"label" : "simple",
"value" : "simple"
},
{
"label" : "complex",
"value" : "complex"
}
]

How do I add an array of elements in MongoDB to an array in an existing document?

In MongoDB, I'm trying to write a query to add elements from an array to an existing document, but instead of adding the elements as objects:
property: ObjectID(xxx)
the elements are getting added as just
ObjectID(xxx)
Forgive me if I get the terminology wrong. I'm completely new to MongoDB; I normally only work with relational databases. How do I properly add these new elements?
I have a collection called auctions which has two fields: ID and properties. Properties is an array of objects named property. Here's an example with two auction documents:
** I changed the object IDs to make them easier to reference in our discussion
Collection db.auctions
{
"_id" : ObjectId("abc"),
"properties" : [
{
"property" : ObjectId("prop1")
},
{
"property" : ObjectId("prop2")
},
{
"property" : ObjectId("prop3")
}]
}
{
"_id" : ObjectId("def"),
"properties" : [
{
"property" : ObjectId("prop97")
},
{
"property" : ObjectId("prop98")
}]
}
I want to add 3 new properties to auction "abc". How do I do this?
Here's is what I tried:
I have an array of properties that looks like this:
Array PropsToAdd
[
ObjectId("prop4"),
ObjectId("prop5"),
ObjectId("prop6")
]
I wrote an update query to push these properties into the properties array in auctions:
db.auctions.update(
{"_id": "abc"}
,
{ $push: { properties: { $each: PropsToAdd } } }
);
This query gave the result below. Notice that instead of adding elements named property with a value from my array, it's just added my values from my array. I obviously need to add that "property" part, but how do I do that?
Collection db.auctions (_id "abc" only)
{
"_id" : ObjectId("abc"),
"properties" : [
{
"property" : ObjectId("prop1")
},
{
"property" : ObjectId("prop2")
},
{
"property" : ObjectId("prop3")
},
ObjectId("prop4"),
ObjectId("prop5"),
ObjectId("prop6"),
ObjectId("prop7")]
}
The result I'm looking for is this:
Collection db.auctions (_id "abc" only)
{
"_id" : ObjectId("abc"),
"properties" : [
{
"property" : ObjectId("prop1")
},
{
"property" : ObjectId("prop2")
},
{
"property" : ObjectId("prop3")
},
{
"property" : ObjectId("prop4")
},
{
"property" : ObjectId("prop5")
},
{
"property" : ObjectId("prop6")
}
}
Here is some further information on that array of properties I'm adding. I get it from running these queries. Perhaps one of them needs changed?
This query gets an array of current properties:
var oldActiveProperties = db.properties.distinct( "saleNumber", { "active": true, "auction": ObjectId("abc") } );
Then those results are used to find properties in the new file that weren't in the old file:
var PropsToAdd = db.newProperties.distinct(
"_id"
, { "saleNumber": { "$nin": oldActiveProperties }, "active": true}
);
The resulting array is what I need to add to the auctions collection.
Use the JavaScript's native map() method to map the array into an array of documents. The following shows this:
var PropsToAdd = db.newProperties.distinct("_id",
{ "saleNumber": { "$nin": oldActiveProperties }, "active": true}
).map(function (p) { return { property: p }; });
db.auctions.update(
{"_id": "abc"},
{ $push: { "properties": { "$each": PropsToAdd } } }
);

Need help to search document with random field names

I looked through the MongoDB documentation and googled this question but couldn't really find a suitable answer.
encounter a problem where I need to search documents in a collection, but 3 fields name will change from one doc to another even though they are always at the same positions.
In the following example, the 366_DAYS can be 2_HOURS, 35_DAYs etc from document to document, but they will be in the same position.
The _XC4ucB8sEeSybaax341rBg will change to another random string from doc to doc, again it will be at the same position for all docs.
Other fields do not change name and stay at the same position.
I want a query to search for records where debitAmount >=creditAmount or endDate > now().
set02:PRIMARY> db.account.find({ _id: "53e51b1b0cf22cb159fa5f38" }).pretty()
{
"_id" : "53e51b1b0cf22cb159fa5f38",
"_version" : 6,
"_transId" : "e3e96377-a2d2-4b75-a946-f621df182c5e-2719",
"accountBalances" : {
"TEST_TIME" : {
"thresholds" : {
},
"deprovisioned" : false,
"quotas" : {
"366_DAYS" : {
"thresholds" : {
},
"quotaCode" : "366_DAYS",
"credits" : {
"_XC4ucB8sEeSybaax341rBg" : {
"startDate" : ISODate("2014-08-08T18:46:51.351Z"),
"creditAmount" : "86460",
"endDate" : ISODate("2014-08-09T18:48:19Z"),
"started" : true,
"debits" : {
"consolidated" : {
"creationDate" : ISODate("2014-08-08T19:15:55.396Z"),
"debitAmount" : "1300",
"debitId" : "consolidated"
}
},
"creditId" : "_XC4ucB8sEeSybaax341rBg"
}
}
}
},
"expiredReservations" : {
},
"accountBalanceCode" : "TEST_TIME",
"reservations" : {
}
}
},
"subscriberId" : "53e51b1b0cf22cb159fa5f38"
}
Can you use arrays for quotas and credits? That would make the path be the same.
"quotas": [
{
"days": 365,
"thresholds": {},
"credits": [
{
"id": "_XC4ucB8sEeSybaax341rBg"
}
]
}
]
Two cases come to mind. Which one applies to you is unclear to me from the question so providing for both possibilities.
CASE 1:
You will always have either 366_DAYS, 2_HOURS or 35_DAYS inside quotas and only one possible creditId per document. If this is the case, then why replicate the quotaCode and the creditId both as a sub-field and as the key inside quotas and credits respectively. You could alter the structure of your document as follows:
{
"_id": "53e51b1b0cf22cb159fa5f38",
"_version": 6,
"_transId": "e3e96377-a2d2-4b75-a946-f621df182c5e-2719",
"accountBalances": {
"TEST_TIME": {
"thresholds": {},
"deprovisioned": false,
"quotas": {
"thresholds": {
},
"quotaCode": "366_DAYS",
"credits": {
"startDate": ISODate("2014-08-08T18:46:51.351Z"),
"creditAmount": "86460",
"endDate": ISODate("2014-08-09T18:48:19Z"),
"started": true,
"debits": {
"consolidated": {
"creationDate": ISODate("2014-08-08T19:15:55.396Z"),
"debitAmount": "1300",
"debitId": "consolidated"
}
},
"creditId": "_XC4ucB8sEeSybaax341rBg"
}
},
"expiredReservations": {
},
"accountBalanceCode": "TEST_TIME",
"reservations": {
}
}
},
"subscriberId": "53e51b1b0cf22cb159fa5f38"
}
Now the fieldPath for fields in your queries would be:
"accountBalances.TEST_TIME.quotas.credits.creditAmount"
"accountBalances.TEST_TIME.quotas.credits.debits.consolidated.debitAmount"
"accountBalances.TEST_TIME.quotas.credits.startDate"
CASE 2:
quotas and credits may contain more than one subdocument. In this case viktortnk's approach of having quotas and credits as arrays will work. The fieldPath for your queries may then be written as:
"accountBalances.TEST_TIME.quotas.[zero-base-index].credits.[zero-base-index].creditAmount"
"accountBalances.TEST_TIME.quotas.[zero-base-index].credits.[zero-base-index].debits.consolidated.debitAmount"
"accountBalances.TEST_TIME.quotas.[zero-base-index].credits.[zero-base-index].startDate"

Is it possible to query MongoDB, using ONLY Array([x][y[x][z]]) Approach? NOT knowing Elements' Content?

This is the first of 7 test/example documents, in collection "SoManySins."
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Treats" : "Sin1 = Gluttony",
"Sin1" : "Gluttony",
"Favourited" : "YES",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
I would like to be able to query to retrieve any info in any position,
just by referring to the position. The following document,
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Sin1" : "Gluttony",
"?????????" : "??????",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
One could retrieve whatever might be in the 3rd key~value
pair. Why should one have to know ahead of time what the
data is, in the key? If one has the same structure for the
collection, who needs to know? This way, you can get
double the efficiency? Like having a whole lot of mailboxes,
and your app's users supply the key and the value; your app
just queries the dbs' documents' arrays' positions.
Clara? finally? I hope?
The sample document you've provided is not saved as an array in BSON:
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"Sin1" : "Gluttony",
"?????????" : "??????",
"RecentActivity" : "YES",
"GoAgain?" : "YeaSure."
}
Depending on the MongoDB driver you are using, the fields here are typically represented in your application code as an associative array or hash. These data structures are not order-preserving so you cannot assume that the 3rd field in a given document will correspond to the same field in another document (or even that the same field ordering will be consistent on multiple fetches). You need to reference the field by name.
If you instead use an array for your fields, you can refer by position or select a subset of the array using the $slice projection.
Example document with an array of fields:
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"fields": [
{ "Sin1" : "Gluttony" },
{ "?????????" : "??????" },
{ "RecentActivity" : "YES" },
{ "GoAgain?" : "YeaSure." }
]
}
.. and query to find the second element of the fields array (a $slice with skip 1, limit 1):
db.SoManySins.find({}, { fields: { $slice: [1,1]} })
{
"_id" : ObjectId("51671bb6a6a02d7812000018"),
"fields" : [
{
"?????????" : "??????"
}
]
}
This is one way to Query and get back data when you may not
know what the data is, but you know the structure of the data:
examples in Mongo Shell, and in PHP
// the basics, setup:
$dbhost = 'localhost'; $dbname = 'test';
$m = new Mongo("mongodb://$dbhost");
$db = $m->$dbname;
$CursorFerWrites = $db->NEWthang;
// defining a set of data, creating a document with PHP:
$TheFieldGenerator = array( 'FieldxExp' => array(
array('Doc1 K1'=>'Val A1','Doc1 K2'=>'ValA2','Doc1 K3'=>'Val A3'),
array('Doc2 K1'=>'V1','Doc2 K2'=>'V2','Doc2 K3'=>'V3' ) ) ) ;
// then write it to MongoDB:
$CursorFerWrites->save($TheFieldGenerator);
NOTE : In the Shell : This produces the same Document:
> db.NEWthang.insert({"FieldxExp" : [
{"Doc1 K1":"Val A1","Doc1 K2":"Val A2","Doc1 K3":"Val A3"},
{"Doc2 K1":"V1", "Doc2 K2":"V2","Doc2 K3":"V3"}
]
})
#
Now, some mongodb Shell syntax:
> db.NEWthang.find().pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc1 K1" : "Val A1",
"Doc1 K2" : "Val A2",
"Doc1 K3" : "Val A3"
},
{
"Doc2 K1" : "V1",
"Doc2 K2" : "V2",
"Doc2 K3" : "V3"
}
]
}
> db.NEWthang.find({}, { "FieldxExp" : { $slice: [1,1]} } ).pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc2 K1" : "V1",
"Doc2 K2" : "V2",
"Doc2 K3" : "V3"
}
]
}
> db.NEWthang.find({}, { "FieldxExp" : { $slice: [0,1]} } ).pretty()
{
"_id" : ObjectId("516c4053baa133464d36e836"),
"FieldxExp" : [
{
"Doc1 K1" : "Val A1",
"Doc1 K2" : "Val A2",
"Doc1 K3" : "Val A3"
}
]
}
Finally, how about write the Query in some PHP ::
// these will be for building the MongoCursor:
$myEmptyArray = array();
$TheProjectionCriteria = array('FieldxExp'=> array('$slice' => array(1,1)));
// which gets set up here:
$CursorNEWthang1 = new MongoCollection($db, 'NEWthang');
// and now ready to make the Query/read:
$ReadomgomgPls=$CursorNEWthang1->find($myEmptyArray,$TheProjectionCriteria);
and the second document will be printed out:
foreach ($ReadomgomgPls as $somekey=>$AxMongoDBxDocFromCollection) {
var_dump($AxMongoDBxDocFromCollection);echo '<br />';
}
Hope this is helpful for a few folks.

Upsert with pymongo and a custom _id field

I'm attempting to store pre-aggregated performance metrics in a sharded mongodb according to this document.
I'm trying to update the minute sub-documents in a record that may or may not exist with an upsert like so (self.collection is a pymongo collection instance):
self.collection.update(query, data, upsert=True)
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
data:
{ 'minute': { '16': { '45': 1.6693091}}}
The problem is that in this case the 'minute' subdocument always only has the last hour: { minute: metric} entry, the minute subdocument does not create new entries for other hours, it's always overwriting the one entry.
I've also tried this with a $set style data entry:
{ '$set': { 'minute': { '16': { '45': 1.6693091}}}}
but it ends up being the same.
What am I doing wrong?
In both of the examples listed you are simply setting a field ('minute')to a particular value, the only reason it is an addition the first time you update is because the field itself does not exist and so must be created.
It's hard to determine exactly what you are shooting for here, but I think what you could do is alter your schema a little so that 'minute' is an array. Then you could use $push to add values regardless of whether they are already present or $addToSet if you don't want duplicates.
I had to alter your document a little to make it valid in the shell, so my _id (and some other fields) are slightly different to yours, but it should still be close enough to be illustrative:
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
}
}
Now let's add a minute field with an array of documents instead of a single document:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '16': {'45': 1.6693091}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
}
]
}
Then, to illustrate the addition, add a slightly different entry (since I am using $addToSet this is required for a new field to be added:
db.foo.update({'_id': 'u12345CHA-2RU020130304'}, { $addToSet : {'minute': { '17': {'48': 1.6693391}}}})
db.foo.find({'_id': 'u12345CHA-2RU020130304'}).pretty()
{
"_id" : "u12345CHA-2RU020130304",
"metadata" : {
"adaptor_id" : "CHA-2RU",
"array_serial" : 12345,
"date" : ISODate("2013-03-18T23:28:50.660Z"),
"processor_id" : 0
},
"minute" : [
{
"16" : {
"45" : 1.6693091
}
},
{
"17" : {
"48" : 1.6693391
}
}
]
}
I ended up setting the fields like this:
query:
{ '_id': u'12345CHA-2RU020130304',
'metadata': { 'adaptor_id': 'CHA-2RU',
'array_serial': 12345,
'date': datetime.datetime(2013, 3, 4, 0, 0, tzinfo=<UTC>),
'processor_id': 0}
}
I'm setting the metrics like this:
data = {"$set": {}}
for metric in csv:
date_utc = metric['date'].astimezone(pytz.utc)
data["$set"]["minute.%d.%d" % (date_utc.hour,
date_utc.minute)] = float(metric['metric'])
which creates data like this:
{"$set": {'minute.16.45': 1.6693091,
'minute.16.46': 1.566343,
'minute.16.47': 1.22322}}
So that when self.collection.update(query, data, upsert=True) is run it updates those fields.