Mongodb create index for boolean and integer fields - mongodb

user collection
[{
deleted: false,
otp: 3435,
number: '+919737624720',
email: 'Test#gmail.com',
name: 'Test child name',
coin: 2
},
{
deleted: false,
otp: 5659,
number: '+917406732496',
email: 'anand.satyan#gmail.com',
name: 'Nivaan',
coin: 0
}
]
I am using below command to create index Looks like for string it is working
But i am not sure this is correct for number and boolean field.
db.users.createIndex({name:"text", email: "text", coin: 1, deleted: 1})
I am using this command to filter data:
db.users.find({$text:{$search:"anand.satya"}}).pretty()
db.users.find({$text:{$search:"test"}}).pretty()
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
string related fields working. But numeric and boolean fields are not working.
Please check how i will create index for them

The title and comments in this question are misleading. Part of the question is more focused on how to query with fields that contain boolean and integer fields while another part of the question is focused on overall indexing strategies.
Regarding indexing, the index that was shown in the question is perfectly capable of satisfying some queries that include predicates on coin and deleted. We can see that when looking at the explain output for a query of .find({$text:{$search:"test"}, coin:123, deleted: false}):
> db.users.find({$text:{$search:"test"}, coin:123, deleted: false}).explain().queryPlanner.winningPlan.inputStage
{
stage: 'FETCH',
inputStage: {
stage: 'IXSCAN',
filter: {
'$and': [ { coin: { '$eq': 123 } }, { deleted: { '$eq': false } } ]
},
keyPattern: { _fts: 'text', _ftsx: 1, coin: 1, deleted: 1 },
indexName: 'name_text_email_text_coin_1_deleted_1',
isMultiKey: false,
isUnique: false,
isSparse: false,
isPartial: false,
indexVersion: 2,
direction: 'backward',
indexBounds: {}
}
}
Observe here that the index scan stage (IXSCAN) is responsible for providing the filter for the coin and deleted predicates (as opposed to the database having to do that after FETCHing the full document.
Separately, you mentioned in the question that these two particular queries aren't working:
db.users.find({$text:{$search:2}}).pretty()
db.users.find({$text:{$search:false}}).pretty()
And by 'not working' you are referring to the fact that no results are being returned. This is also related to the following discussion in the comments which seemed to have a misleading takeaway:
You'll have to convert your coin and deleted fields to string, if you want it to be picked up by $search – Charchit Kapoor
So. There is no way for searching boolean or integger field. ? – Kiran S youtube channel
Nope, not that I know of. – Charchit Kapoor
You can absolutely use boolean and integer values in your query predicate to filter data. This playground demonstrates that.
What #Charchit Kapoor is mentioning that can't be done is using the $text operator to match and return results whose field values are not strings. Said another way, the $text operator is specifically used to perform a text search.
If what you are trying to achieve are direct equality matches for the field values, both strings and otherwise, then you can delete the text index as there is no need for using the $text operator in your query. A simplified query might be:
db.users.find({ name: "test"})
Demonstrated in this playground.
A few additional things come to mind:
Regarding indexing overall, databases will generally consider using an index if the first key is used in the query. You can read more about this for MongoDB specifically on this page. The takeaway is that you will want to create the appropriate set of indexes to align with your most commonly executed queries. If you have a query that just filters on coin, for example, then you may wish to create an index that has coin as its first key.
If you want to check if the exact string value is present in multiple fields, then you may want to do so using the $or operator (and have appropriate indexes for the database to use).
If you do indeed need more advanced text searching capabilities, then it would be appropriate to either continue using the $text operator or consider Atlas Search if the cluster is running in Atlas. Doing so does not prevent you from also having indexes that would support your other queries, such as on { coin: 2 }. It's simply that the syntax for performing such a query needs to be updated.
There is a lot going on here, but the big takeaway is that you can absolutely filter data based on any data type. Doing so simply requires using the appropriate syntax, and doing so efficiently requires an appropriate indexing strategy to be used along side of the queries.

Related

Mongoose findOne not working as expected on nested records

I've got a collection in MongoDB whose simplified version looks like this:
Dealers = [{
Id: 123,
Name: 'Someone',
Email: 'someone#somewhere.com',
Vehicles: [
{
Id: 1234,
Make: 'Honda',
Model: 'Civic'
},
{
Id: 2345,
Make: 'Ford',
Model: 'Focus'
},
{
Id: 3456,
Make: 'Ford',
Model: 'KA'
}
]
}]
And my Mongoose Model looks a bit like this:
const vehicle_model = mongoose.Schema({
Id: {
Type: Number
},
Email: {
Type: String
},
Vehicles: [{
Id: {
Type: Number
},
Make: {
Type: String
},
Model: {
Type: String
}
}]
})
Note the Ids are not MongoDB Ids, just distinct numbers.
I try doing something like this:
const response = await vehicle_model.findOne({ 'Id': 123, 'Vehicles.Id': 1234 })
But when I do:
console.log(response.Vehicles.length)
It's returned all the Vehicles nested records instead on the one I'm after.
What am I doing wrong?
Thanks.
This question is asked very frequently. Indeed someone asked a related question here just 18 minutes before this one.
When query the database you are requesting that it identify and return matching documents to the client. That is a separate action entirely than asking for it to transform the shape of those documents before they are sent back to the client.
In MongoDB, the latter operation (transforming the shape of the document) is usually referred to as "Projection". Simple projections, specifically just returning a subset of the fields, can be done directly in find() (and similar) operations. Most drivers and the shell use the second argument to the method as the projection specification, see here in the documentation.
Your particular case is a little more complicated because you are looking to trim off some of the values in the array. There is a dedicated page in the documentation titled Project Fields to Return from Query which goes into more detail about different situations. Indeed near the bottom is a section titled Project Specific Array Elements in the Returned Array which describes your situation more directly. In it is where they describe usage of the positional $ operator. You can use that as a starting place as follows:
db.collection.find({
"Id": 123,
"Vehicles.Id": 1234
},
{
"Vehicles.$": 1
})
Playground demonstration here.
If you need something more complex, then you would have to start exploring usage of the $elemMatch (projection) operator (not the query variant) or, as #nimrod serok mentions in the comments, using the $filter aggregation operator in an aggregation pipeline. The last option here is certainly the most expressive and flexible, but also the most verbose.

MongoDB big collection aggregation is slow

I'm having a problem with the time of my mongoDB query, from a node backend using mongoose. i have a collection called people that has 10M records, and every record is queried from the backend and inserted from another part of the system that's written in c++ and needs to be very fast.
this is my mongoose schema:
{
_id: {type: String, index: {unique: true}}, // We generate our own _id! Might it be related to the slowness?
age: { type: Number },
id_num: { type: String },
friends: { type: Object }
}
schema.index({'id_num': 1}, { unique: true, collation: { locale: 'en_US', strength: 2 } })
schema.index({'age': 1})
schema.index({'id_num': 'text'});
Friends is an object looking like that: {"Adam": true, "Eve": true... etc.}.
there's no meaning to the value, and we use dictionaries to avoid duplicates fast on C++.
also, we didn't encounter a set/unique-list type of field in mongoDB.
The Problem:
We display people in a table with pagination. the table has abilities of sort, search, and select number of results.
At first, I queried all people and searched, sorted and paged it on the js. but when there are a lot of documents, It's turning problematic (memory problems).
The next thing i did was to try to fit those manipulations (searching, sorting & paging) on my query.
I used mongo's text search- but it not matches a partial word. is there any way to search a partial insensitive string? (I prefer not to use regex, to avoid unexpected problems)
I have to sort before paging, so I tried to use mongo sort. the problem is, that when the user wants to sort by "Friends", we want to return the people sorted by their number of friends (number of entries in the object).
The only way i succeeded pulling it off was using $addFields in aggregation:
{$addFields: {$size: {$ifNull: [{$objectToArray: '$friends'}, [] ]}}}
this addition is taking forever! when sorting by friends, the query takes about 40s for 8M people, and without this part it takes less than a second.
I used limit and skip for pagination. it works ok, but we have to wait until the user requests the second page and make another very long query.
In the end, this is the the interesting code part:
const { sortBy, sortDesc, search, page, itemsPerPage } = req.query
// Search never matches partial string
const match = search ? {$text: {$search: search}} : {}
const sortByInDB = ['age', 'id_num']
let sort = {$sort : {}}
const aggregate = [{$match: match}]
// if sortBy is on a simple type, we just use mongos sort
// else, we sortBy friends, and add a friends_count field.
if(sortByInDB.includes(sortBy)){
sort.$sort[sortBy] = sortDesc === 'true' ? -1 : 1
} else {
sort.$sort[sortBy+'_count'] = sortDesc === 'true' ? -1 : 1
// The problematic part of the query:
aggregate.push({$addFields: {friends_count: {$size: {
$ifNull: [{$objectToArray: '$friends'},[]]
}}}})
}
const numItems = parseInt(itemsPerPage)
const numPage = parseInt(page)
aggregate.push(sort, {$skip: (numPage - 1)*numItems}, {$limit: numItems})
// Takes a long time (when sorting by "friends")
let users = await User.aggregate(aggregate)
I tried indexing all simple fields, but the time is still too much.
The only other solution i could think of, is making mongo calculate a field "friends_count" every time a document is created or updated- but i have no idea how to do it, without slowing our c++ that writes to the DB.
Do you have any creative idea to help me? I'm lost, and I have to shorten the time drastically.
Thank you!
P.S: some useful information- the C++ area is writing the people to the DB in a bulk once in a while. we can sync once in a while and mostly rely on the data to be true. So, if that gives any of you any idea for a performance boost, i'd love to hear it.
Thanks!

Unique index in mongoDB 3.2 ignoring null values

I want to add the unique index to a field ignoring null values in the unique indexed field and ignoring the documents that are filtered based on partialFilterExpression.
The problem is Sparse indexes can't be used with the Partial index.
Also, adding unique indexes, adds the null value to the index key field and hence the documents can't be ignored based on $exist criteria in the PartialFilterExpression.
Is it possible in MongoDB 3.2 to get around this situation?
I am adding this answer as I was looking for a solution and didn't find one. This may not answer exactly this question or may be, but will help lot of others out there like me.
Example. If the field with null is houseName and it is of type string, the solution can be like this
db.collectionName.createIndex(
{name: 1, houseName: 1},
{unique: true, partialFilterExpression: {houseName: {$type: "string"}}}
);
This will ignore the null values in the field houseName and still be unique.
Yes, you can create partial index in MongoDB 3.2
Please see https://docs.mongodb.org/manual/core/index-partial/#index-type-partial
MongoDB recommend usage of partial index over sparse index. I'll suggest you to drop your sparse index in favor of partial index.
You can create partial index in mongo:3.2.
Example, if ipaddress can be "", but "127.0.0.1" should be unique. The solution can be like this:
db.collectionName.createIndex(
{"ipaddress":1},
{"unique":true, "partialIndexExpression":{"ipaddress":{"$gt":""}}})
This will ignore "" values in ipaddress filed and still be unique
{
"YourField" : {
"$exists" : true,
"$gt" : "0",
"$type" : "string"
}
}
To create at mongodbCompass you must write it as JSON:
for find other types wich supports see this link.
Yes, that can be a kind of a problem that the partial filter expression cannot contain any 'not' filters.
For those who can be interested in a C# solution for an index like this, here is an example.
We have a 'User' entity, which has one-to-one 'relation' to a 'Doctor' entity.
This relation is represented by the not required, nullable field 'DoctorId' in the 'User' entity. In other words, there is a requirement that a given 'Doctor' can be linked to only single 'User' at a time.
So we need an unique index which can fire an exception when something attempts to set DoctorId to the same Guid which already set for any other 'User' entity. At the same time multiple 'null' entries must be allowed for the 'DoctorId' field, since many users do not have any doctor attached to them.
The solution to build this kind of an index looks like:
var uniqueDoctorIdIndexDefinition = new IndexKeysDefinitionBuilder<User>()
.Ascending(o => o.DoctorId);
var existsFilter = Builders<User>.Filter.Exists(o => o.DoctorId);
var notNullFilter = Builders<User>.Filter.Type(o => o.DoctorId, BsonType.String);
var andFilter = Builders<User>.Filter.And(existsFilter, notNullFilter);
var createIndexOptions = new CreateIndexOptions<User>
{
Unique = true,
Name = UniqueDoctorIdIndexName,
PartialFilterExpression = andFilter,
};
var uniqueDoctorIdIndex = new CreateIndexModel<User>(
uniqueDoctorIdIndexDefinition,
createIndexOptions);
users.Indexes.CreateOne(uniqueDoctorIdIndex);
Probably in your description of a 'User' entity you must directly specify the BsonType of the 'DoctorId' field, by using an attribute, for example in our case it was:
[BsonRepresentation(BsonType.String)]
public Guid? DoctorId { get; set; }
I am more than sure that there is a more proficient and compact solution for this problem, so would be happy if somebody suggests it here.
Here is an example that I modified from the mongoDB partial index documentation:
db.contacts.createIndex(
{ email: 1 },
{ unique: true, partialFilterExpression: { email: { $exists: true } } }
)
IMPORTANT
To use the partial index, a query must contain the filter expression (or a modified filter expression that specifies a subset of the filter expression) as part of its query condition.
You can see that queries such as:
db.contacts.find({'email':'name#email.com'}).explain()
will indicate that they doing an index scan, even if you don't specify {$exists: true} because you're implicitly specifying a subset of the partialFilterExpression by specifying an email in your filter.
On the other hand, the following query will do a collection scan:
db.contacts.find({email: {$exists: false}})
WARNING
mythicalcoder's answer (currently the highest voted answer) is very misleading because it successfully creates a unique index, but the query planner will not generally be able to use the index you've created unless you add houseName: {$type: "string"} into your filter expression. This can have performance costs which you might not be aware of and can cause problems down the road.

Why was the explain query output giving me BasicCursor eventhough the collection had indexes on it?

I have a collection named stocks , i created a compound index on it as shown below
db.stocks.ensureIndex({"symbol":1,"date":1,"type": 1, "isValid": 1,"rootsymbol":1,"price":1},{"unique" : false})
I have set profilinglevel , to find out all the slow queries .
One of the query was below took 38 millis , when did explain on it , this was the below result
Sorry i have updated my question
db.stocks.find({ query: { symbol: "AAPLE", date: "2014-01-18", type: "O", isValid: true }, orderby: { price: "1" } }).explain();
{
"cursor" : "BasicCursor",
"nscanned" : 705402,
"nscannedObjects" : 705402,
"n" : 0,
"millis" : 3456,
"indexBounds" : {
}
}
My question is why its showing a BasicCursor even though it had indexes on it ??
I'm pretty sure that the issue here is your use of the find() function. You are specifying a query parameter and inside it, placing your search criteria. I don't think that you need to actually put query in there. Simply insert your search criteria. Something like this:
db.stocks.find({
symbol: "AAPLE",
date: "2014-01-18",
type: "O",
isValid: true
}).sort( { "price": 1} ).explain();
Note also my changes to the sorting. You can read more about sorting a cursor here.
Since the problem isn't actually described I will go on to describe it.
You are calling top level query operators with functional operators. So for example you call query operators here:
{ query: { symbol: "AAPLE", date: "2014-01-18", type: "O", isValid: true }, orderby: { price: "1" } }
In the form of query and orderby but then you call a functional operator:
explain();
This is a known bug with MongoDB that these two do not play well together and so produce the output you get.
Of course when the query comes in and is parsed by MongoDB it is recorded in the profile with query operators query and orderby and maxscan etc.
This is more of a problem when calling the command.
Reference: MongoDB $query operator ignores index? I couldn't find the actual JIRA for this but this is related.
Edit: I think this vaguely represents it: https://jira.mongodb.org/browse/SERVER-6767
The syntax is not the problem. In order for MongoDB to use a compound index (ie, an index that contains more than one field), the fields in your query/sort must be a prefix of the index fields. In this case, your index includes these fields: symbol, date, type, isValid, rootsymbol, and price. Your query/sort includes all fields except rootsymbol, so the index cannot be used. Possible solutions:
Remove rootsymbol from the index, or
Add rootsymbol to your query, or
If you can't do either of the above, create another index without rootsymbol
Reference
Regarding the syntax, there is in fact a query syntax in which an index cannot be used: the $where clause requires evaluating inline JavaScript, so indexes cannot be used. For example:
db.collection.find( { $where: "field1.value > field2.value" } )

MongoDB update fields of subarrays that meet criteria

I am having a problem where I need to update a specific field found in arrays contained in a bigger array that match certain criteria as of MongoDB v2.2.3.
I have the following mongodb sample document.
{
_id: ObjectId("50be30b64983e5100a000009"),
user_id: 0
occupied: {
Greece: [
{
user_id: 3,
deadline: ISODate("2013-02-08T19:19:28Z"),
fulfilled: false
},
{
user_id: 4,
deadline: ISODate("2013-02-16T19:19:28Z"),
fulfilled: false
}
],
Italy: [
{
user_id: 2,
deadline: ISODate("2013-02-15T19:19:28Z"),
fulfilled: false
}
]
}
}
Each country in the occupied array has its own set of arrays.
What I am trying to do is find the document where user_id is 0, search through the occupied.Greece array only for elements that have "deadline": {$gt: ISODate(current-date)} and change their individual "fulfilled" fields to true.
I have tried the $ and $elemMatch operators but they match only one, the first, array element in a query whereas I need it to match all eligible elements by the given criteria and make the update without running the same query multiple times or having to process the arrays client-side.
Is there no server-side solution for generic updates in a single document? I am developing using PHP though a solution to this should be universal.
I'm afraid this is not possible. From the documentation:
Remember that the positional $ operator acts as a placeholder for the first match of the update query selector. [emphasis not mine]
This is tracked in the MongoDB Jira under SERVER-1243.
There's quite a number of related feature requests in the jira, mostly under the topic 'virtual collections'.