Sorting nested objects in MongoDB - mongodb

So I have documents that follow this schema:
{
_id: String,
globalXP: {
xp: {
type: Number,
default: 0
},
level: {
type: Number,
default: 0
}
},
guilds: [{ _id: String, xp: Number, level: Number }]
}
So basically users have their own global XP and xp based on each guild they are in.
Now I want to make a leaderboard for all the users that have a certain guildID in their document.
What's the most efficient way to fetch all the user documents that have the guild _id in their guilds array and how do I sort them afterwards?
I know it might be messy as hell but bare with me here.

If I've understand well, you only need this line of code:
var find = await model.find({"guilds._id":"your_guild_id"}).sort({"globalXP.level":-1})
This query will return all documentas where guilds array contains the specific _id and sort by player level.
In this way the best level will be displayed first.
Here is an example how the query works. Please check if it work as you expected.

Related

Mongoose findOne not working as expected on nested records

I've got a collection in MongoDB whose simplified version looks like this:
Dealers = [{
Id: 123,
Name: 'Someone',
Email: 'someone#somewhere.com',
Vehicles: [
{
Id: 1234,
Make: 'Honda',
Model: 'Civic'
},
{
Id: 2345,
Make: 'Ford',
Model: 'Focus'
},
{
Id: 3456,
Make: 'Ford',
Model: 'KA'
}
]
}]
And my Mongoose Model looks a bit like this:
const vehicle_model = mongoose.Schema({
Id: {
Type: Number
},
Email: {
Type: String
},
Vehicles: [{
Id: {
Type: Number
},
Make: {
Type: String
},
Model: {
Type: String
}
}]
})
Note the Ids are not MongoDB Ids, just distinct numbers.
I try doing something like this:
const response = await vehicle_model.findOne({ 'Id': 123, 'Vehicles.Id': 1234 })
But when I do:
console.log(response.Vehicles.length)
It's returned all the Vehicles nested records instead on the one I'm after.
What am I doing wrong?
Thanks.
This question is asked very frequently. Indeed someone asked a related question here just 18 minutes before this one.
When query the database you are requesting that it identify and return matching documents to the client. That is a separate action entirely than asking for it to transform the shape of those documents before they are sent back to the client.
In MongoDB, the latter operation (transforming the shape of the document) is usually referred to as "Projection". Simple projections, specifically just returning a subset of the fields, can be done directly in find() (and similar) operations. Most drivers and the shell use the second argument to the method as the projection specification, see here in the documentation.
Your particular case is a little more complicated because you are looking to trim off some of the values in the array. There is a dedicated page in the documentation titled Project Fields to Return from Query which goes into more detail about different situations. Indeed near the bottom is a section titled Project Specific Array Elements in the Returned Array which describes your situation more directly. In it is where they describe usage of the positional $ operator. You can use that as a starting place as follows:
db.collection.find({
"Id": 123,
"Vehicles.Id": 1234
},
{
"Vehicles.$": 1
})
Playground demonstration here.
If you need something more complex, then you would have to start exploring usage of the $elemMatch (projection) operator (not the query variant) or, as #nimrod serok mentions in the comments, using the $filter aggregation operator in an aggregation pipeline. The last option here is certainly the most expressive and flexible, but also the most verbose.

What is the proper way of combining 2 documents in MongoDB

I currently have 2 collections:
users that looks like:
const User = new Schema({
username:{
type: String,
required: true
},
password:{
type: String,
required: true
},
refreshTokens:{
type: String,
required: false,
},
// ID of the guild a user belongs to
guildID:{
type: Schema.Types.ObjectId,
ref: 'guilds',
default: '61a679e18d84bff40c2f88fd',
required: true
},
power:{
type: Number,
required: true,
default: 100
}
})
guilds contains the objectID as _id and a field "name".
Now I would like to get a document by username and also the information of the guild that the user belongs to.
I read about using db.collection.aggregate this however results in all users and their guild information. Is it possible to use $match inside the aggregation to just get that single username? I'm fairly new to MongoDB and am just trying things out. If you have any resources or documentation I'd be happy to read those too!
In SQL it would look something like:
SELECT * FROM users where username = 'SomeUsername' INNER JOIN guilds on users.guildID = guilds.id
Aggregations can solve this (not recommended)
userCollection.aggregate([
{
$lookup: {
from: 'guilds',
as: 'guild',
localeField: 'guildID',
foreignField: '_id',
}
},
{
$unwrap: {
path: '$guilds',
preserveNullAndEmptyArrays: true
},
{
$match: {
$or: [
{ 'guild._id': guildId },
{ ... other options ... }
]
}
}
])
While this works and can be reasonably fast depending on your indexes and number of documents it can be better to add frequently queried fields to the related documents. In your case: add guildId and guildName to your user.
While this duplicates data and might not be considered best practice in relational dbs it is common to do this in document based databases. This is the fastest solution.
The alternative to an aggregation and embedding guildData into the user is to send two queries. One for the user, then one for the guild. This is called the relationship-pattern. This is the most common solution I believe)
Many (all?) ODM libraries, such as mongoose, handle the resolving of relationships automatically for you (mongoose calls this population). Which can simplify querying a lot, I think!

MongoDB big collection aggregation is slow

I'm having a problem with the time of my mongoDB query, from a node backend using mongoose. i have a collection called people that has 10M records, and every record is queried from the backend and inserted from another part of the system that's written in c++ and needs to be very fast.
this is my mongoose schema:
{
_id: {type: String, index: {unique: true}}, // We generate our own _id! Might it be related to the slowness?
age: { type: Number },
id_num: { type: String },
friends: { type: Object }
}
schema.index({'id_num': 1}, { unique: true, collation: { locale: 'en_US', strength: 2 } })
schema.index({'age': 1})
schema.index({'id_num': 'text'});
Friends is an object looking like that: {"Adam": true, "Eve": true... etc.}.
there's no meaning to the value, and we use dictionaries to avoid duplicates fast on C++.
also, we didn't encounter a set/unique-list type of field in mongoDB.
The Problem:
We display people in a table with pagination. the table has abilities of sort, search, and select number of results.
At first, I queried all people and searched, sorted and paged it on the js. but when there are a lot of documents, It's turning problematic (memory problems).
The next thing i did was to try to fit those manipulations (searching, sorting & paging) on my query.
I used mongo's text search- but it not matches a partial word. is there any way to search a partial insensitive string? (I prefer not to use regex, to avoid unexpected problems)
I have to sort before paging, so I tried to use mongo sort. the problem is, that when the user wants to sort by "Friends", we want to return the people sorted by their number of friends (number of entries in the object).
The only way i succeeded pulling it off was using $addFields in aggregation:
{$addFields: {$size: {$ifNull: [{$objectToArray: '$friends'}, [] ]}}}
this addition is taking forever! when sorting by friends, the query takes about 40s for 8M people, and without this part it takes less than a second.
I used limit and skip for pagination. it works ok, but we have to wait until the user requests the second page and make another very long query.
In the end, this is the the interesting code part:
const { sortBy, sortDesc, search, page, itemsPerPage } = req.query
// Search never matches partial string
const match = search ? {$text: {$search: search}} : {}
const sortByInDB = ['age', 'id_num']
let sort = {$sort : {}}
const aggregate = [{$match: match}]
// if sortBy is on a simple type, we just use mongos sort
// else, we sortBy friends, and add a friends_count field.
if(sortByInDB.includes(sortBy)){
sort.$sort[sortBy] = sortDesc === 'true' ? -1 : 1
} else {
sort.$sort[sortBy+'_count'] = sortDesc === 'true' ? -1 : 1
// The problematic part of the query:
aggregate.push({$addFields: {friends_count: {$size: {
$ifNull: [{$objectToArray: '$friends'},[]]
}}}})
}
const numItems = parseInt(itemsPerPage)
const numPage = parseInt(page)
aggregate.push(sort, {$skip: (numPage - 1)*numItems}, {$limit: numItems})
// Takes a long time (when sorting by "friends")
let users = await User.aggregate(aggregate)
I tried indexing all simple fields, but the time is still too much.
The only other solution i could think of, is making mongo calculate a field "friends_count" every time a document is created or updated- but i have no idea how to do it, without slowing our c++ that writes to the DB.
Do you have any creative idea to help me? I'm lost, and I have to shorten the time drastically.
Thank you!
P.S: some useful information- the C++ area is writing the people to the DB in a bulk once in a while. we can sync once in a while and mostly rely on the data to be true. So, if that gives any of you any idea for a performance boost, i'd love to hear it.
Thanks!

Creating compound indexes that will match queries in MongoDB

For our app I'm using the free-tier (for now) on MongoDB-Atlas.
our documents have, among other fields, a start time which is a Datetime object, and a userId int.
export interface ITimer {
id?: string,
_id?: string, // id property assigned by mongo
userId?: number,
projectId?: number,
description?: string,
tags?: number[],
isBillable?: boolean,
isManual?: boolean,
start?: Date,
end?: Date,
createdAt?: Date,
updatedAt?: Date,
createdBy?: number,
updatedBy?: number
};
I'm looking for an index that will match the following query:
let query: FilterQuery<ITimer> = {
start: {$gte: start, $lte: end},
userId: userId,
};
Where start and end parameters are date objects or ISOStrings passed to define a range of days.
Here I invoke the query, sorting results:
let result = await collection.find(query).sort({start: 1}).toArray();
It seems simple enough that the following index would match the above query:
{
key: {
start: 1,
userId: 1,
},
name: 'find_between_dates_by_user',
background: true,
unique: false,
},
But using mongodb-compass to monitor the collection, I see this index is not used.
Moreover, mongodb documentation specifically states that if an index matches a query completely, than no documents will have to be examined, and the results will be based on the indexed information alone.
unfortunately, for every query I run, I see documents ARE examined, meaning my indexes do not match.
Any suggestions? I feel this should be very simple and straightforward, but maybe I'm missing something.
attached is an screenshot from mongodb-compass 'explaining' the query and execution.

Calculating collection stats for a subset of documents in MongoDB

I know the cardinal rule of SE is to not ask a question without giving examples of what you've already tried, but in this case I can't find where to begin. I've looked at the documentation for MongoDB and it looks like there are only two ways to calculate storage usage:
db.collection.stats() returns the statistics about the entire collection. In my case I need to know the amount of storage being used to by a subset of data within a collection (data for a particular user).
Object.bsonsize(<document>) returns the storage size of a single record, which would require a cursor function to calculate the size of each document, one at a time. My only concern with this approach is performance with large amounts of data. If a single user has tens of thousands of documents this process could take too long.
Does anyone know of a way to calculate the aggregate document size of set of records within a collection efficiently and accurately.
Thanks for the help.
This may not be the most efficient or accurate way to do it, but I ended up using a Mongoose plugin to get the size of the JSON representation of the document before it's saved:
module.exports = exports = function defaultPlugin(schema, options){
schema.add({
userId: { type: mongoose.Schema.Types.ObjectId, ref: "User", required: true },
recordSize: Number
});
schema.pre('save', function(next) {
this.recordSize = JSON.stringify(this).length;
next();
});
}
This will convert the schema object to a JSON representation, get it's length, then store the size in the document itself. I understand that this will actually add a tiny bit of extra storage to record the size, but it's the best I could come up with.
Then, to generate a storage report, I'm using a simple aggregate call to get the sum of all of the recordSize values in the collection, filtered by userId:
mongoose.model('YouCollectionName').aggregate([
{
$match: {
userId: userId
}
},
{
$group: {
_id: null,
recordSize: { $sum: '$recordSize'},
recordCount: { $sum: 1 }
}
}
], function (err, results) {
//Do something with your results
});