Find deep children in MongoDB - mongodb

I've searched the docs and web throughly but cannot find any solution for this seemingly trivial problem. Let's say we have a collection called "states". The states collection has a document that looks like this:
{
'name': 'New York',
'towns': [
{
'name': 'Town A',
'buildings':[
{
'name': 'Town Hall',
'open_days': [False, True, True, True, True, True, False],
},
{
'name': 'Fire Department',
'open_days': [True, True, True, True, True, True, True],
},
{
'name': 'Car Dealership',
'open_days': [False, True, True, True, True, True, True],
}
]
},
{
'name': 'Town B',
'buildings':[
{
'name': 'Town Hall',
'open_days': [False, True, True, True, True, True, False],
},
{
'name': 'Police Department',
'open_days': [True, True, True, True, True, True, True],
},
{
'name': 'Karate Dojo',
'open_days': [False, True, False, True, False, True, False],
}
]
}
]
}
In this data, open_days represents whether a building is open or closed on a particular day of the week (index 0 being Sunday for example).
I just want to query the data to pull up all the buildings that are open on Sunday (index 0).
I've tried several odd syntaxes to no avail, is there an easy solution to this? If it matters, I am querying using PyMongo.

It actually depends. First of all there are a few ways to restructure your data a little bit to make it more easy to query and to extract. You don't have to restructure it at all, but if you can I think it would be beneficial.
Some refactoring suggestions:
First of all you might want to split this single large document into multiple documents. Just take out documents from array towns and make each one a standalone document in your collection. Add state name to each. Embedding docs in Mongo is a good idea, but it seems to me that this is a bit over too much. I would even go further extracting each building into a top level document. The reasons for this are: you might not be able to fit all buildings in a large city into a single document (size limit 16Mb). Even if you are able to fit all buildings into a single document your queries will be quite inefficient if you want to select only small part of that document.
Change your array to hold integers corresponding to indexes of weekdays, i.e. Sunday == 0, etc. This will make it simpler and more efficient to query.
So your data would look this way:
{
"stateName": "New York",
"townName": "Town A",
"buildingName": "Town Hall",
"open_days": [1, 2, 3, 4, 5]
}
Now you can find a building in a city that's open on Monday (1):
db.buildings.find({"stateName": "New York", "townName": "Town A", "open_days": 1});
If you don't want to restructure your data take a look at $elemMatch here which projects matched element of an array. You can also take a look at $ here. However, both these operators do not solve your problem because they return first matched element in the array.
You can also use aggregation framework, but you'll be limited to number of results that fit in a single doc (16Mb), won't be able to have a cursor to results, it will be slower and it's over complicated for what you want to do.

What you could use here is aggregation framework. You will find lots of nice examples on aggregation framework in MongoDB docs - Aggregation
What you could be interested in in particular would be $unwind pipeline operator which peels off the elements of an array individually, and returns a stream of documents. $unwind returns one document for every member of the unwound array within every source document.
Here is an example, I hope it provides what you expect it to do:
> db.collection.aggregate(
... { $match : { "towns.buildings.open_days.0" : true }},
... { $unwind : "$towns"},
... { $unwind : "$towns.buildings"},
... { $match : { "towns.buildings.open_days.0" : true }}
... )
{
"result" : [
{
"_id" : ObjectId("52e5ba3beb849fd797d10839"),
"name" : "New York",
"towns" : {
"name" : "Town A",
"buildings" : {
"name" : "Fire Department",
"open_days" : [
true,
true,
true,
true,
true,
true,
true
]
}
}
},
{
"_id" : ObjectId("52e5ba3beb849fd797d10839"),
"name" : "New York",
"towns" : {
"name" : "Town B",
"buildings" : {
"name" : "Police Department",
"open_days" : [
true,
true,
true,
true,
true,
true,
true
]
}
}
}
],
"ok" : 1
}

Related

Mongodb Index behavior different in PROD and Testing environment

I have 2 dedicated Mongo clusters which have the same exact Model, Indexes and we query both envs the same way but the result is different.
user.model.js
const schema = mongoose.Schema({
_id: ObjectId,
role: {
type: String,
enum: ['user', 'admin'],
required: true,
},
score: { type: Number, default: 0 },
deactivated: { type: Date },
});
schema.index(
{ deactivated: 1, role: 1, score: -1 },
{ name: 'search_index', collation: { locale: 'en', strength: 2 } }
);
I noticed that one of our common queries was causing issues on the PROD environment.
The query looks like this:
db.getCollection('users')
.find({deactivated: null, role: 'user'})
.sort({score: -1})
.limit(10)
.collation({locale: 'en', strength: 2})
On the Testing Environment the query runs as expected fully utilizing the index. (has ~80K records total, 1300 deactivated)
But in our PROD env the query, seems to be using only the first part of the compound index. (has ~50K records total, ~20K records deactivated)
The executionStats looks like:
As we can see it is using at least the first part of the index to only search in non-deactivated records, but the SORT is in memory.
This is a legacy application so the first thing I did was ensure that the types of the indexed fields are following the schema in all the records.
I wonder if it could be the "role" collation somehow?
Any hint or clue will be greatly appreciated. Thanks in advance.
Thanks for providing the plans. It is a combination of a few things (including the multikeyness of the production index) that is causing the problem.
There are a few ways to potentially solve this, let's start with the obvious question. Is score supposed to be an array?
The schema suggests not. With MongoDB, an index becomes multikey once a single document is inserted that has an array (even empty) for a key in the index. There is no way to way to "undo" this change apart from rebuilding the index. If the field is not supposed to contain an array, then I would suggest fixing any documents that contain the incorrect data and then rebuilding the index. As this is production, you may want to build a temporary index to reduce the impact to the application while the original index is dropped and recreated. You may also want to look into schema validation to help prevent incorrect data from getting inserted in the future.
If score can be an array, then we'll need to take a different approach. We can see in the UAT plan that a SORT_MERGE is used. The only reason that stage is required is because {"deactivated" : null} seems to have an additional index bound looking for undefined. That may be some internal implementation quirk as that BSON type appears to be deprecated. So updating the data to have an explicit false value for this field and using that check in the query predicate (rather than a check for null) will remove the need to split the plan out with a SORT_MERGE and will probably allow the multikey index to provide the sort:
winningPlan: {
stage: 'LIMIT',
limitAmount: 10,
inputStage: {
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
keyPattern: { deactivated: 1, role: 1, score: -1 },
indexName: 'search_index',
collation: {
locale: 'en',
caseLevel: false,
caseFirst: 'off',
strength: 2,
numericOrdering: false,
alternate: 'non-ignorable',
maxVariable: 'punct',
normalization: false,
backwards: false,
version: '57.1'
},
isMultiKey: true,
multiKeyPaths: { deactivated: [], role: [], score: [ 'score' ] },
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'forward',
indexBounds: {
deactivated: [ '[false, false]' ],
role: [
'[CollationKey(0x514d314b0108), CollationKey(0x514d314b0108)]'
],
score: [ '[MaxKey, MinKey]' ]
}
}
}
}

Finding a nested document and pushing a document into a nested document

So i started learning mongoDB and moongoose today. I have the following Schema:
{
username: {
type : String,
required : true,
unique: true,
trim : true
},
routines : {
type: [
{
_id : mongoose.Schema.Types.ObjectId,
name : String,
todos : [
{
_id : mongoose.Schema.Types.ObjectId,
todo : String,
isCompleted : Boolean
}
]
}
],
}
}
for example:
{
"username": "John",
"routines": [
{
"_id" : 1234, //just for an example assume it to be as 1234
"name" : "Trip plan",
"todos" : [
{
"_id": 1213123,
"todo": "book flight",
"isCompleted" : "false"
}....
]
}......
]
}
what i want to do is, first i would select a document from collection, by using username, then in routines(Array of objects), i want to select a particular routine by id, which would be given by user in request body, now in the selected routine, i want to push into todos array.
For above example, suppose, selecting document with username john, then selecting an object from routines array with _id as 1234, then to its todos i will add a todo.
I have spent almost a day searching about how to do this, while doing this i learnt concepts like arrayFilters, projections. But still couldn't get how to do. Also i read many answers in Stack Overflow, i couldn't grasp much out of them.
PS:I am very new to MongoDB and mongoose, Chances that my question is very silly, and might not be good as per stack overflow standards, I apologize for the same.
You were on the right track, you indeed want to use arrayFilter to achieve this.
Here is a quick example:
// the user you want.
let user = user;
await db.routines.updateOne(
{
username: user.username
},
{
$addToSet: {
"routines.$[elem].todos": newTodo
}
},
{arrayFilters: [{'elem._id': "1234"}]}
);

How to write a query to get

I have following collection with me from MongoDB:
{ "results": [
{
"isExist": true,
"isJourneyEnd": true,
"objectId": "9WtZcxWttk",
"sentTo": [
"JeLRe4yH9R"
],
},
{
"isExist": false,
"isJourneyEnd": true,
"objectId": "9WtZcxWtul",
"sentTo": [
"JeLRe4y9HU"
],
}
]}
In actual, there are many entries in this collection, I've just mentioned two.
If I want to write a query for following statement:
"Print element of array whose isExist is true".
I would like to have some guidance over this, as I am new to MongoDB.
Try this :
db.collection.find({isExist: true});

MongoDB Index on field with multiple identifications as nested objects

I have a mongoDB collection containing items that can be identified through multiple identification schemes
{
"identification" : {
"SCHEME1" : [ "9181983" ],
"SCHEME2" : [ "ABC" , "CDE" ],
"SCHEME4" : ["FDE"]
}
}
{
"identification" : {
"SCHEME2" : [ "LALALAL" ],
"SCHEME5" : [ "CH98790789879" ]
}
},
An item will most likely have not all identification schemes, some have (like the example above ) 1-2-4 others may have different ones. The number of identification schemes is not finally defined and will grow. Every identification can only exists once.
I want to perform two different queries:
Seach an item with scheme and identification, e.g.
db.item.find({"identification.SCHEME2": "CDE"})
Seach all items with a specific identification scheme, e.g.
db.item.find({"identification.SCHEME2": {$exists: true}})
My approach was to create sparse indexes:
db.item.createIndex( { identification.SCHEME1: 1 }, { sparse: true, unique: true} );
db.item.createIndex( { identification.SCHEME2: 1 }, { sparse: true, unique: true } );
db.item.createIndex( { identification.SCHEME3: 1 }, { sparse: true, unique: true } );
and so on ....
This approach worked perfectly until I found out that there is a limit of 64 indexes on one collection in mongoDB.
Has anyone an idea how I could index the whole field "identification" with one index ? Or is my document structure wrong ? Any ideas are welcome, thanks.
I encountered the same problem in a reporting db that had dimensions that I wanted to use in the find clause. The solution was to use a fixed field to hold the data as a k/v pair and index on that.
In your case:
{
"identification" : [
{"k":"SCHEME1", "v":[ "9181983" ]},
{"k":"SCHEME2", "v":[ "ABC" , "CDE" ]},
{"k":"SCHEME4", "v":["FDE"]}
]
}
If you now create a compound index over {"identification.k":1, "identification.v":1} you can search it with the index like:
db.item.find({"identification.k":"SCHEME2", "identification.v":"CDE"})
Downside is you need to update your schema...

MongoDB Full Text on an Array within an Array of Elements

I am unable to retrieve documents when an array within an array of elements contains text that should match my search.
Here are two example documents:
{
_id: ...,
'foo': [
{
'name': 'Thing1',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing2',
'data': {
'text': ['X', 'Y']
}
}
]
}
{
_id: ...,
'foo': [
{
'name': 'Thing3',
'data': {
'text': ['X', 'X']
}
},{
'name': 'Thing4',
'data': {
'text': ['X', 'Y']
}
}
]
}
By using the following query, I am able to return both documents:
db.collection.find({'foo.data.text': {'$in': ['Y']}}
However, I am unable to return these results using the full text command/index:
db.collection.runCommand("text", {search" "Y"})
I am certain that the full text search is working, as the same command issuing a search against "Thing1" will return the first document, and "Thing3" returns the second document.
I am certain that both foo.data.text and foo.name are both in the text index when using db.collection.getIndexes().
I created my index using: db.collection.ensureIndex({'foo.name': 'text', 'foo.data.text': 'text'}). Here are the indexes as shown by the above command:
{
"v" : 1,
"key" : {
"_fts" : "text",
"_ftsx" : 1
},
"ns" : "testing.collection",
"background" : true,
"name" : "my_text_search",
"weights" : {
"foo.data.text" : 1,
"foo.name" : 1,
},
"default_language" : "english",
"language_override" : "language",
"textIndexVersion" : 1
}
Any suggestion on how to get this working with mongo's full text search?
Text search does not currently support indexed fields of nested arrays (at least not explicitly specified ones). An index on "foo.name" works fine as it is only one array deep, but the text search will not recurse through the subarray at "foo.data.text". Note that this behavior may change in the 2.6 release.
But fear not, in the meantime nested arrays can be text-indexed, just not with individually specified fields. You may use the wildcard specifier $** to recursively index ALL string fields in your collection, i.e.
db.collection.ensureIndex({"$**": "text" }
as documented at http://docs.mongodb.org/manual/tutorial/create-text-index-on-multiple-fields/ . Be careful though as this will index EVERY string field and could have negative storage and performance consequences. The simple document structure you describe though should work fine. Hope that helps.