Labelling collections in MongoDB

Labelling collections in MongoDB - mongodb

I have two collections: persons (millions) and groups. When creating a group I have a rule, which are actually the criteria to find persons. Now, what I want to do is to add the groups _id to all the matching persons.
The request to my API:
POST /groups {
"rule": {"age": {"$gt": 18}},
"description": "persons above 18"
}
On my MongoDB:
db.persons.find({"age": {"$gt": 18}})
Now I want to add the group _id to a groups array field in each of the matching persons, so that I can later get all persons in the group. Can this be done directly in the same query?

Maybe I'm missing something, but a simple update statement should do it:
db.persons.update(
{ "age" : {$gt : 18} },
{ $addToSet : { "groups" : groupId }},
false, // no upsert ($addToSet and $push still add the field)
true); // update multiple docs in this query
Note that $push will add the same value over and over, while $addToSet will ensure uniqueness in the array, which probably makes sense in this case.
You'll have to find/insert the group itself in a different statement, though.

Related

Query performances with $in and $nin mongodb at scale

I have 3 collections :
users
mappers
seens
Here is a document in users where "_id" is the id of a user and the ids array contains a list of other users _id :
{
_id: "uid",
ids: [
"uid0",
"uid5",
...
"uid100"
]
}
A document in seens looks exactly like the one in users but in the ids array, there are ids of mappers that have been seen by the user, the "_id" is the one of the user owner of the array.
Here is a mapper where "_id" is the ID of a user and map.id is an id potentially existing in the ids field of a document of seens :
{
_id: "uid",
at: 1453592,
map: {
id: "uid",
...
}
}
I want to retrieve all mappers that meet some conditions :
_id must be in the ids of the user
at must be $lt now and $gt than a given value (that is lower than now)
map.id must not be in ids of the seens of the user
The query looks like this :
{
"_id": {"$in": ids},
"$and": [
{"at": {"$lt": now}},
{"at": {"$gt": start_date}},
{"map.id": {"$nin": seens}}
],
},
Where ids is the array of the user ids and seens is the array of the mappers already seen.
I have done some experiment on this query, it's working very fine with a thousands of records.
However, if i have 10 000 ids, 10 000 seens and 10 000 mappers and performing this query, it takes 15seconds.
I have added an index on : at (descending) and map.id (ascending), it now takes 8sec.
I simply know that if my collections scale, this is only takes longer and longer.
How can i make it always returning results in less than 1sec not matter how many documents i have in my collections ?
The underlying question is how to keep the query efficiency using $in and $nin at scale ?

Mongodb Like one of array items

Consider I have a collections which it's document are like this :
{
"name": "some-name",
"age": 23,
"foods" : ["pizza", "cola", "bread", "hotdog"]
}
what I need to achieve is I need to find all documents which has at least one food item which is like for example "pi".
so I want one of array items to be like search query string.

You can easily do your task using $regex.
So, using this query:
db.collection.find({
"foods": {
"$regex": "pi"
}
})
Mongo will find all documents where foods fields contains at least one item who match the regex "pi".
Example here

MongoDB sort is slow for non-index dynamic field

Following is my MongoDB query to show the organization listing along with the user count per organization. As per my data model, the "users" collection has an array userOrgMap which maintains the organizations ( by orgId) to which the user belongs to. The "organization" collection doesn't store the list of assigned users in its collection. The "users" collection has 11,200 documents and the "organizations" has 10,500 documents.
db.organizations.aggregate([
{$lookup : {from:"users",localField:"_id", foreignField:"userOrgMap.orgId",as:"user" }},
{ $project : {_id:1,name:1,"noOfUsers":{$size:"$user"}}},
{$sort:{noOfUsers:-1},
{$limit : 15},
{$skip : 0}
]);
Without the sort, the query works fast. With the sort, the query works very slow. It takes around 200 secs.
I even tried another way which is also taking more time.
db.organizations.aggregate([
{$lookup : {from:"users",localField:"_id", foreignField:"userOrgMap.orgId",as:"user" }},
{$unwind:"$user"}
{$group :{_id:"$_id"},name:{"$firstName":"$name"},userCount:{$sum:1}},
{$sort:{noOfUsers:-1},
{$limit : 15},
{$skip : 0}
]);
For the above query, without the $sort itself takes more time.
Need help on how to solve this issue.

Get the aggregation to use an index that begins with noOfUsers as I do not see a $match stage here.

The problem is resolved. I created an index on "userOrgMap.orgId". The query is fast now.

Pymongo query on "subdocuments"

Each instance of a collection called groups has a field called actives, which is a list of "subdocuments", i.e. things of the form {keys: values}. One field (key) of the subdocuments is id_, which is a string.
If I take the set of all subdocuments present in all the instances of groups, then there won't be 2 equal id_, i.e. id_ identifies uniquely each subdocument. However, I get a new subdocument. I need to run a program with the subdocument's id that will go to a website and extract info about the subdocument. Within this info I find the group that the subdocument belongs to. However, I don't want to run this program if I already have some subdocument, in some instance of groups with the same id_ as the "new" subdocument.
How can I list the ids of all the subdocuments of all the documents (or instances of groups)?
Edit:
Suppose that the documents of the DB groups are:
doc1: {"neighbourhood": "n1", "actives": [{"id_": "MHTEQ", "info": "a_long_string"}, {"id_": "PNPQA", "info": "a_long_string"}]}
doc2: {"neighbourhood": "n2", "actives": [{"id_": "MERVX", "info": "a_long_string"}, {"id_": "ZDKJW", "info": "a_long_string"}]}
What I want to do is to list all the "id_", i.e.
def list_ids(groups):
do_sth_with_groups
return a_list
print(list_ids(groups))
output: ["MHTEQ", "PNPQA", "MERVX", "ZDKJW"]

Use the aggregation pipeline with the $unwind and $project operators.
results = db['collection'].aggregate(
[
{"$project": {"actives": 1, "_id": 0}},
{"$unwind": "$actives"},
{"$project": {"id_str": "$actives.id_", "_id": 0}}
]
)
return list(results)
https://docs.mongodb.com/v3.2/reference/operator/aggregation/unwind/
https://docs.mongodb.com/v3.2/reference/operator/aggregation/project/
Sample output
{
"id_str" : "MHTEQ"
}
{
"id_str" : "PNPQA"
}
{
"id_str" : "MERVX"
}
{
"id_str" : "ZDKJW"
}

MongoDB Aggregation count over a relation

I've two collection, Buildings and Orders. A Building can have many Orders (1:N Relation).
I'm trying to achieve a "Top Ten Statistic"(Which Buildings have the most Orders) with the aggregation framework.
My Problem is, how can i get the total Orders per Building? Is there a way to "mix" data from two collections in one aggregation?
Currently i'm doing something like this:
db.buildings.aggregate( [
{ $group : _id : { street : "$street",
city : "$city",
orders_count : "$orders_count" }},
{ $sort : { _id.orders_count : -1 }},
{ $limit : 10}
] );
But in this case the "orders_count" is pre-calculated value. It works but is very inefficient and to slow for "live" aggregation.
Is there a way to count the related orders per building directly in the aggregation (im sure there is a way...)?
Many Thanks

You don't say how orders relate to buildings in your schema but if an order has a building id or name it references, just group by that:
db.orders.aggregate( { $group : { _id: "$buildingId",
sum: {$sum:1}
}
},
/* $sort by sum:-1, $limit:10 like you already have */
)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Labelling collections in MongoDB - mongodb

Related

Query performances with $in and $nin mongodb at scale

Mongodb Like one of array items

MongoDB sort is slow for non-index dynamic field

Pymongo query on "subdocuments"

MongoDB Aggregation count over a relation

Categories

Resources