Pymongo query on "subdocuments" - mongodb

Each instance of a collection called groups has a field called actives, which is a list of "subdocuments", i.e. things of the form {keys: values}. One field (key) of the subdocuments is id_, which is a string.
If I take the set of all subdocuments present in all the instances of groups, then there won't be 2 equal id_, i.e. id_ identifies uniquely each subdocument. However, I get a new subdocument. I need to run a program with the subdocument's id that will go to a website and extract info about the subdocument. Within this info I find the group that the subdocument belongs to. However, I don't want to run this program if I already have some subdocument, in some instance of groups with the same id_ as the "new" subdocument.
How can I list the ids of all the subdocuments of all the documents (or instances of groups)?
Edit:
Suppose that the documents of the DB groups are:
doc1: {"neighbourhood": "n1", "actives": [{"id_": "MHTEQ", "info": "a_long_string"}, {"id_": "PNPQA", "info": "a_long_string"}]}
doc2: {"neighbourhood": "n2", "actives": [{"id_": "MERVX", "info": "a_long_string"}, {"id_": "ZDKJW", "info": "a_long_string"}]}
What I want to do is to list all the "id_", i.e.
def list_ids(groups):
do_sth_with_groups
return a_list
print(list_ids(groups))
output: ["MHTEQ", "PNPQA", "MERVX", "ZDKJW"]

Use the aggregation pipeline with the $unwind and $project operators.
results = db['collection'].aggregate(
[
{"$project": {"actives": 1, "_id": 0}},
{"$unwind": "$actives"},
{"$project": {"id_str": "$actives.id_", "_id": 0}}
]
)
return list(results)
https://docs.mongodb.com/v3.2/reference/operator/aggregation/unwind/
https://docs.mongodb.com/v3.2/reference/operator/aggregation/project/
Sample output
{
"id_str" : "MHTEQ"
}
{
"id_str" : "PNPQA"
}
{
"id_str" : "MERVX"
}
{
"id_str" : "ZDKJW"
}

Related

Get count of a value of a subdocument inside an array with mongoose

I have Collection of documents with id and contact. Contact is an array which contains subdocuments.
I am trying to get the count of contact where isActive = Y. Also need to query the collection based on the id. The entire query can be something like
Select Count(contact.isActive=Y) where _id = '601ad0227b25254647823713'
I am using mongo and mongoose for the first time. Please edit the question if I was not able to explain it properly.
You can use an aggregation pipeline like this:
First $match to get only documents with desired _id.
Then $unwind to get different values inside array.
Match again to get the values which isActive value is Y.
And $group adding one for each document that exists (i.e. counting documents with isActive= Y). The count is stores in field total.
db.collection.aggregate([
{
"$match": {"id": 1}
},
{
"$unwind": "$contact"
},
{
"$match": {"contact.isActive": "Y"}
},
{
"$group": {
"_id": "$id",
"total": {"$sum": 1}
}
}
])
Example here

How to query certain elements of an array of objects? (mongodb)

say I have a mongo DB collection with records as follows:
{
email: "person1#gmail.com",
plans: [
{planName: "plan1", dataValue = 100},
{planName: "plan2", dataValue = 50}
]
},
{
email: "person2#gmail.com",
plans: [
{planName: "plan3", dataValue = 25},
{planName: "plan4", dataValue = 12.5}
]
}
and I want to query such that only the dataValue returns where the email is "person1#gmail.com" and the planName is "plan1". How would I approach this?
You can accomplish this using the Aggregation Pipeline.
The pipeline may look like this:
db.collection.aggregate([
{ $match: { "email" :"person1#gmail.com", "plans.planName": "plan1" }},
{ $unwind: "$plans" },
{ $match: { "plans.planName": "plan1" }},
{ $project: { "_id": 0, "dataValue": "$plans.dataValue" }}
])
The first $match stage will retrieve documents where the email field is equal to person1#gmail.com and any of the elements in the plans array has a planName equal to plan1.
The second $unwind stage will output one document per element in the plans array. The plans field will now be an object containing a single plan object.
In the third $match stage, the unwound documents are further matched against to only include documents with a plans.planName of plan1. Finally, the $project stage excludes the _id field and projects a single dataValue field with a value of plans.dataValue.
Note that with this approach, if the email field is not unique you may have multiple documents consist with just a dataValue field.

How to process mongo documents and get field wise data in array

Currently I'm hitting at a problem to process the mongodb documents and get the field wise values. For example, say mongo contains these documents:
[
{ "name": "test1", "age": 20, "gender": "male" },
{ "name": "test2", "age": 21, "gender": "female" },
{ "name": "test3", "age": 30, "gender": "male"}
]
Expected Output:
{
"name": ["test1","test2","test3"],
"age": [20,21,30],
"gender": ["male","female", "male"]
}
Is it possible to retrieve data from mongo in the above format? I dont want to write some javascript functions to process this. Looking at retrieving the data by using mongo functions along with the find query.
You'd need to use the aggregation framework to get the desired result. Run the following aggregation pipeline which filters the documents in the collection getting into the pipeline for grouping using the $match operator. This is similar to the find() query filter.
db.collection.aggregate([
{ "$match": { "age": { "$gte": 20 } } }, // filter on users with age >= 20
{
"$group": {
"_id": null,
"name": { "$push": "$name" },
"age": { "$push": "$age" },
"gender": { "$push": "$gender" }
}
},
{
"$project": {
"_id": 0,
"name": 1,
"age": 1,
"gender": 1
}
}
])
Sample Output
{
"name": ["test1", "test2", "test3"],
"age": [20, 21, 30],
"gender": ["male", "female", "male"]
}
In the above pipeline, the first pipeline step is the $match operator which is similar to SQL's WHERE clause. The above example filters incoming documents on the age field (age greater than or equal to 20).
One thing to note here is when executing a pipeline, MongoDB pipes operators into each other. "Pipe" here takes the Linux meaning: the output of an operator becomes the input of the following operator. The result of each operator is a new collection of documents. So Mongo executes the previous pipeline as follows:
collection | $match | $group | $project => result
The next pipeline stage is the $group operator. Inside the $group pipeline, you are now grouping all the filtered documents where you can specify an _id value of null to calculate accumulated values for all the input documents as a whole. Use the available accumulators to return the desired aggregation on the grouped documents. The accumulator operator $push is used in this grouping operation because it returns an array of expression values for each group.
Accumulators used in the $group stage maintain their state (e.g. totals, maximums, minimums, and related data) as documents progress through the pipeline.
To get the documents with the desired field, the $project operator which is similar to SELECT in SQL is used to rename the field names and select/deselect the fields to be returned, out of the grouped fields. If you specify 0 for a field, it will NOT be sent in the pipeline to the next operator.
You cannot do this with the find command.
Try using mongodb's aggregation pipeline.
Specifically use $group in combination with $push
See here: https://docs.mongodb.com/manual/reference/operator/aggregation/group/#pipe._S_group

How can I aggregate the documents in an array instead of in a collection in MongoDB?

I have the following collection cidade on my MongoDB database:
{
"_id" : 0,
"nome" : "CIDADE0",
"qtdhab" : 1231043
}
So, here's what I want to do. I'm trying to do the equivalent of this SQL query in MongoDB:
SELECT MAX(QTDHAB) FROM CIDADE WHERE QTDHAB <= (SELECT AVG(QTDHAB) FROM CIDADE);
Basically, I want the biggest value of the field qtdhab from the collection cidade which follows the condition of being lower than the average value of the same field qtdhab on the same collection cidade. So, I have mapped this into 3 queries, like this:
var resultado = db.cidade.aggregate({"$group": {"_id": null, "avgHab": {"$avg": "$qtdhab"}}}).toArray()
var resultado2 = db.cidade.find({qtdhab: {$lte: resultado[0].avgHab}}).toArray()
resultado2.aggregate({"$group": {"_id": null, "maxHab": {"$max": "$qtdhab"}}})
The problem is, as I found out the hard way, that there is no .aggregate method for an array such as resultado2, so the last query returns an error. Is there any other way for me to get the biggest value for the field qtdhab out of this array of documents that was generated by these 2 queries?
You can achieve this by using only one query(aggregation):
db.cidade.aggregate([
/*find the avg and keep also the aggregated field*/
{"$group": {
"_id": null,
"qtdhab" : {"$push" :"$qtdhab"},
"avgHab": {"$avg": "$qtdhab"}
}},
/*unwind the array*/
{$unwind: "$qtdhab"},
/*get the max from the fields less than the avg*/
{"$group": {
"_id": null,
"res" : {"$max" : {$cond :[{$lte :["$qtdhab", "$avgHab"]}, "$qtdhab", null]} },
}}
])

Labelling collections in MongoDB

I have two collections: persons (millions) and groups. When creating a group I have a rule, which are actually the criteria to find persons. Now, what I want to do is to add the groups _id to all the matching persons.
The request to my API:
POST /groups {
"rule": {"age": {"$gt": 18}},
"description": "persons above 18"
}
On my MongoDB:
db.persons.find({"age": {"$gt": 18}})
Now I want to add the group _id to a groups array field in each of the matching persons, so that I can later get all persons in the group. Can this be done directly in the same query?
Maybe I'm missing something, but a simple update statement should do it:
db.persons.update(
{ "age" : {$gt : 18} },
{ $addToSet : { "groups" : groupId }},
false, // no upsert ($addToSet and $push still add the field)
true); // update multiple docs in this query
Note that $push will add the same value over and over, while $addToSet will ensure uniqueness in the array, which probably makes sense in this case.
You'll have to find/insert the group itself in a different statement, though.