Query on all descendants of node - mongodb

Given a collection of MongoDb documents with a property "myContacts" like this:
{
"_id": 123,
"myContacts" : {
"contacts" : {
"10" : {
"_id" : NumberLong(10),
"name" : "c1",
"prop" : true
},
"20" : {
"_id" : NumberLong(20),
"name" : "c2"
},
}
}
}
I want to select all documents, where at least one contact lacks the "prop" field.
I figured out a general query:
db.getCollection('xyz').find({ 'myContacts.contacts.???.prop': { $exists: false } })
The problem is that IDs of the contacts are part of the path and I cannot know them ahead. I want sth like 'myContacts.contacts.$anyChild.prop', but cannot find sth similar in the mongo docs.
Does it mean there is no way to do it?
PS: I cannot change the document structure, a live app uses it. I've spent some time with Google and my bet it's not possible. I however would like an opinion from people who have experience with Mongo.

Thank you guys for helpful comments, this got me going! I could get the results I wanted with:
db.getCollection('xyz').aggregate([{$project: {_id:1, contacts:{$objectToArray: "$myContacts.contacts"}}}, {$match: {"contacts.v.prop" : null}}])

Related

Query MongoDB collection

I have a MongoDB collection like this:
{
"_id" : {
"processUuid" : "653d0937-2afc-4915-ad42-d2b69f344402",
"partnerId" : "p377"
},
"isComplete" : true,
"tasks" : {
"dbb361a7-4b73-4691-bde5-2b160346464f" : {
"sku" : "4060079000048",
"status" : "FAILED",
"errorList" : [....]
},
"790dbc6f-563d-4eb7-931c-3cc604563dc1" : {
"sku" : "4060079000130",
"status" : "SUCCESSFUL",
"errorList" : [....]
},
... more tasks
}
... more processes
}
I want to query a certain sku and couldnt find a way. I know i could write an aggregation pipeline to project the inner part of a task, but then i would lose the task identifier which i need in order to add some stuff to a task.
I know the document structure is weird, i would have done it different, but its given.
I tried what i found about querying nested documents and so on, but unfortunately didnt get it. Any hints are appreciated.

Using $last on Mongo Aggregation Pipeline

I searched for similar questions but couldn't find any. Feel free to point me in their direction.
Say I have this data:
{ "_id" : ObjectId("5694c9eed4c65e923780f28e"), "name" : "foo1", "attr" : "foo" }
{ "_id" : ObjectId("5694ca3ad4c65e923780f290"), "name" : "foo2", "attr" : "foo" }
{ "_id" : ObjectId("5694ca47d4c65e923780f294"), "name" : "bar1", "attr" : "bar" }
{ "_id" : ObjectId("5694ca53d4c65e923780f296"), "name" : "bar2", "attr" : "bar" }
If I want to get the latest record for each attribute group, I can do this:
> db.content.aggregate({$group: {_id: '$attr', name: {$last: '$name'}}})
{ "_id" : "bar", "name" : "bar2" }
{ "_id" : "foo", "name" : "foo2" }
I would like to have my data grouped by attr and then sorted by _id so that only the latest record remains in each group, and that's how I can achieve this. BUT I need a way to avoid naming all the fields that I want in the result (in this example "name") because in my real use case they are not known ahead.
So, is there a way to achieve this, but without having to explicitly name each field using $last and just taking all fields instead? Of course, I would sort my data prior to grouping and I just need to somehow tell Mongo "take all values from the latest one".
See some possible options here:
Do multiple find().sort() queries for each of the attr values you
want to search.
Grab the original _id of the $last doc, then do a findOne() for each of those values (this is the more extensible option).
Use the $$ROOT system variable as shown here.
This wouldn't be the quickest operation, but I assume you're using this more for analytics, not in response to a user behavior.
Edited to add slouc's example posted in comments:
db.content.aggregate({$group: {_id: '$attr', lastItem: { $last: "$$ROOT" }}}).

I have big database on mongodb and can't find and use my info

This my code:
db.test.find() {
"_id" : ObjectId("4d3ed089fb60ab534684b7e9"),
"title" : "Sir",
"name" : {
"_id" : ObjectId("4d3ed089fb60ab534684b7ff"),
"first_name" : "Farid"
},
"addresses" : [
{
"city" : "Baku",
"country" : "Azerbaijan"
},{
"city" : "Susha",
"country" : "Azerbaijan"
},{
"city" : "Istanbul",
"country" : "Turkey"
}
]
}
I want get output only all city. Or I want get output only all country. How can i do it?
I'm not 100% about your code example, because if your 'find' by ID there's no need to search by anything else... but I wonder whether the following can help:
db.test.insert({name:'farid', addresses:[
{"city":"Baku", "country":"Azerbaijan"},
{"city":"Susha", "country":"Azerbaijan"},
{"city" : "Istanbul","country" : "Turkey"}
]});
db.test.insert({name:'elena', addresses:[
{"city" : "Ankara","country" : "Turkey"},
{"city":"Baku", "country":"Azerbaijan"}
]});
Then the following will show all countries:
db.test.aggregate(
{$unwind: "$addresses"},
{$group: {_id:"$country", countries:{$addToSet:"$addresses.country"}}}
);
result will be
{ "result" : [
{ "_id" : null,
"countries" : [ "Turkey", "Azerbaijan"]
}
],
"ok" : 1
}
Maybe there are other ways, but that's one I know.
With 'cities' you might want to take more care (because I know cities with the same name in different countries...).
Based on your question, there may be two underlying issues here:
First, it looks like you are trying to query a Collection called "test". Often times, "test" is the name of an actual database you are using. My concern, then, is that you are trying to query the database "test" to find any collections that have the key "city" or "country" on any of the internal documents. If this is the case, what you actually need to do is identify all of the collections in your database, and search them individually to see if any of these collections contain documents that include the keys you are looking for.
(For more information on how the db.collection.find() method works, check the MongoDB documentation here: http://docs.mongodb.org/manual/reference/method/db.collection.find/#db.collection.find)
Second, if this is actually what you are trying to do, all you need to for each collection is define a query that only returns the key of the document you are looking for. If you get more than 0 results from the query, you know documents have the "city" key. If they don't return results, you can ignore these collections. One caveat here is if data about "city" is in embedded documents within a collection. If this is the case, you may actually need to have some idea of which embedded documents may contain the key you are looking for.

mongo db update subdocument - subdocument not in array

I have a document like this,
{
"S" : {
"500209" : {
"total_income" : 38982,
"interest_income" : 1714,
"reported_eps" : 158.76,
"year" : 201303
}
},
"_id" : "pl"
}
I am trying to update this document like this,
{
"S" : {
"500209" : {
"total_income" : 38982,
"interest_income" : 1714,
"reported_eps" : 158.76,
"year" : 201303,
"yield": 1001, <== inserted a new attribute
}
},
"_id" : "pl"
}
I have tried this,
db.my_collection.update({_id: 'pl'},{$set: {'S.500209.yield': 1}})
But I couldn't make it. And I searched in stack overflow and google but I couldn't find out.
I got so many answers but most of them keeping sub-docuemnts in array.
Pleas help me to solve my issue, and please tell me why most of them keeping subdocuments in array.
Number key might cause the problem. Update your field name.
db.my_collection.update({_id: 'pl'},{$set: {'S.a500209.yield': 1}})
EDIT
Upgrade mongo version. It works fine with 2.4.

MongoDB versioning with timestamps as keys

I'm trying to implement versioning (based on storing the diffs between documents) in MongoDB as described in this post: Ways to implement data versioning in MongoDB.
I'm using the same approach so far but want to make queries like: "Give me all the changes between Date1 and Date2". The schema presented in the post uses just a plain incremental numbering as the key for the changes which doesn't allow this kind of querying:
{
_id : "id of address book record",
changes : {
1234567 : { "city" : "Omaha", "state" : "Nebraska" },
1234568 : { "city" : "Kansas City", "state" : "Missouri" }
}
}
My only guess is that to make time based queries I would have to include some time information in the key value. Perhaps an ObjectId or ISODate like so:
{
_id : "id of address book record",
changes : {
ISODate(Date) : { "city" : "Omaha", "state" : "Nebraska" },
ObjectId(id) : { "city" : "Kansas City", "state" : "Missouri" }
}
}
I googled quite a while for a solution but couldn't find anything really useful. If someone has an idea on how to do this, it would be great if you can help me out. It might even be that this is the completely wrong way to approach the problem and that there is a better way to solve this issue ... I am open to any suggestions!
Thanks for helping me out...
Using a IsoDate (or simple Date) for versioning is a good approach. However, in order to search for it efficiently, you shouldn't use it as a field identifier, but instead make the changes an array and put the date into the sub-documents in this array:
{
_id : "id of address book record",
changes : [
{ "change_date": ISODate(Date), "city" : "Omaha", "state" : "Nebraska" },
{ "change_date": ISODate(Date), "city" : "Kansas City", "state" : "Missouri" }
]
}
This allows you to use the changes.change_date field for queries with $gte or $lte. You can also find the most recent one with the $max aggregation grouping operator (not to be confused with the $max query operator, which does something completely different).
You might also want to add an index to changes.change_date to speed up these queries.