MongoDB nodejs insertMany takes too much time - mongodb

In a nodejs backend I have a function where I insert some values on a MongoDB collection.
I'm using this:
const result = await SomeModel.insertMany(someArrayOfData);
This takes a lot of time, like 2 minutes.
I'm inserting like 11.000 documents, each document has this schema:
const someModelSchema = mongoose.Schema({
IDEvento: Number,
descMet: String,
metrica: Number,
local: String,
porcent: Number,
tipoBono: String,
year: String
}, { timestamps: true });
Once is inserted I can see the values like:
Why could be some reasons for taking so much? the array is not a big array of data (the length is 2800), only 2800 documents, and each document are pretty simple, not objects nor long strings, they have very short values.
Also, there's a:
await SomeModel.deleteMany({ year });
And takes a lot of time too, like 1 minute

Related

MongoDB big collection aggregation is slow

I'm having a problem with the time of my mongoDB query, from a node backend using mongoose. i have a collection called people that has 10M records, and every record is queried from the backend and inserted from another part of the system that's written in c++ and needs to be very fast.
this is my mongoose schema:
{
_id: {type: String, index: {unique: true}}, // We generate our own _id! Might it be related to the slowness?
age: { type: Number },
id_num: { type: String },
friends: { type: Object }
}
schema.index({'id_num': 1}, { unique: true, collation: { locale: 'en_US', strength: 2 } })
schema.index({'age': 1})
schema.index({'id_num': 'text'});
Friends is an object looking like that: {"Adam": true, "Eve": true... etc.}.
there's no meaning to the value, and we use dictionaries to avoid duplicates fast on C++.
also, we didn't encounter a set/unique-list type of field in mongoDB.
The Problem:
We display people in a table with pagination. the table has abilities of sort, search, and select number of results.
At first, I queried all people and searched, sorted and paged it on the js. but when there are a lot of documents, It's turning problematic (memory problems).
The next thing i did was to try to fit those manipulations (searching, sorting & paging) on my query.
I used mongo's text search- but it not matches a partial word. is there any way to search a partial insensitive string? (I prefer not to use regex, to avoid unexpected problems)
I have to sort before paging, so I tried to use mongo sort. the problem is, that when the user wants to sort by "Friends", we want to return the people sorted by their number of friends (number of entries in the object).
The only way i succeeded pulling it off was using $addFields in aggregation:
{$addFields: {$size: {$ifNull: [{$objectToArray: '$friends'}, [] ]}}}
this addition is taking forever! when sorting by friends, the query takes about 40s for 8M people, and without this part it takes less than a second.
I used limit and skip for pagination. it works ok, but we have to wait until the user requests the second page and make another very long query.
In the end, this is the the interesting code part:
const { sortBy, sortDesc, search, page, itemsPerPage } = req.query
// Search never matches partial string
const match = search ? {$text: {$search: search}} : {}
const sortByInDB = ['age', 'id_num']
let sort = {$sort : {}}
const aggregate = [{$match: match}]
// if sortBy is on a simple type, we just use mongos sort
// else, we sortBy friends, and add a friends_count field.
if(sortByInDB.includes(sortBy)){
sort.$sort[sortBy] = sortDesc === 'true' ? -1 : 1
} else {
sort.$sort[sortBy+'_count'] = sortDesc === 'true' ? -1 : 1
// The problematic part of the query:
aggregate.push({$addFields: {friends_count: {$size: {
$ifNull: [{$objectToArray: '$friends'},[]]
}}}})
}
const numItems = parseInt(itemsPerPage)
const numPage = parseInt(page)
aggregate.push(sort, {$skip: (numPage - 1)*numItems}, {$limit: numItems})
// Takes a long time (when sorting by "friends")
let users = await User.aggregate(aggregate)
I tried indexing all simple fields, but the time is still too much.
The only other solution i could think of, is making mongo calculate a field "friends_count" every time a document is created or updated- but i have no idea how to do it, without slowing our c++ that writes to the DB.
Do you have any creative idea to help me? I'm lost, and I have to shorten the time drastically.
Thank you!
P.S: some useful information- the C++ area is writing the people to the DB in a bulk once in a while. we can sync once in a while and mostly rely on the data to be true. So, if that gives any of you any idea for a performance boost, i'd love to hear it.
Thanks!

How to update a field in a MongoDB collection so it increments linearly

I've been struggling with mongodb for some time now, and the idea is quite simple: I have a collection, and I want o to add a new ID field. This field is controlled by our API, and it auto increments it for each new document inserted.
The thing is, the collection already has some documents, so I must initialize each document with a number sequentially, no matter the order:
collection: holiday {
'date': date,
'name': string
}
The collection has 12 documents, so each document should get an ID property, with values from 1 to 12. What kind of query or function should I use to do this? No restrictions so far, and performance is not a problem.
Maybe it is not optimal but works :)
var newId = 1;
var oldIds = [];
db.holiday.find().forEach(it => {
const documentToMigrate = it;
oldIds.push(it._id);
documentToMigrate._id = newId;
db.holiday.save(documentToMigrate);
++newId;
})
db.holiday.remove({_id: {$in: oldIds}});

Creating compound indexes that will match queries in MongoDB

For our app I'm using the free-tier (for now) on MongoDB-Atlas.
our documents have, among other fields, a start time which is a Datetime object, and a userId int.
export interface ITimer {
id?: string,
_id?: string, // id property assigned by mongo
userId?: number,
projectId?: number,
description?: string,
tags?: number[],
isBillable?: boolean,
isManual?: boolean,
start?: Date,
end?: Date,
createdAt?: Date,
updatedAt?: Date,
createdBy?: number,
updatedBy?: number
};
I'm looking for an index that will match the following query:
let query: FilterQuery<ITimer> = {
start: {$gte: start, $lte: end},
userId: userId,
};
Where start and end parameters are date objects or ISOStrings passed to define a range of days.
Here I invoke the query, sorting results:
let result = await collection.find(query).sort({start: 1}).toArray();
It seems simple enough that the following index would match the above query:
{
key: {
start: 1,
userId: 1,
},
name: 'find_between_dates_by_user',
background: true,
unique: false,
},
But using mongodb-compass to monitor the collection, I see this index is not used.
Moreover, mongodb documentation specifically states that if an index matches a query completely, than no documents will have to be examined, and the results will be based on the indexed information alone.
unfortunately, for every query I run, I see documents ARE examined, meaning my indexes do not match.
Any suggestions? I feel this should be very simple and straightforward, but maybe I'm missing something.
attached is an screenshot from mongodb-compass 'explaining' the query and execution.

MEAN Stack Data Model array of documents or document of arrays

I have a problem, deciding how to design my data model for later easy querying and extracting of field values...
The thing is, I use the MEAN stack and I have two collections in my MongoDB database: FA and FP.
var FASchema = new Schema({
Timestamp: Date,
ProgBW: Number,
posFlexPot: Number,
negFlexPot: Number,
Leistungsuntergrenze: Number,
Leistungsobergrenze: Number,
posGesEnergie: Number,
negGesEnergie: Number,
Preissignal: Number,
Dummy1: Schema.Types.Mixed,
Dummy2: Schema.Types.Mixed,
Dummy3: Schema.Types.Mixed
//same: Dummy: {}
});
var FPSchema = mongoose.schema( {
_id: { type: String },//mongoose.Schema.Types.ObjectId,
Demonstrator: Number,
erstellt: {type: Date, 'default': Date.now},
von: Date,
bis: Date,
FAs: [{type: mongoose.Schema.Types.ObjectId, ref: "FA"}]
})
First question here, is it possible to automatically create _id fields as Strings, because I heard it should be easier to query for id's as strings later..
Second question: my FP schema contains instances (or rather subdocuments as plain text) inside the field "FAs" (How exactly this is done, is a later topic). My question is, should I draft that field as an array of documents
[{FAinstance1.field1value, FAinstance1.field2.value},{FAinstance2.fiel1.value,...}]
with lots of FA documents, or should I do something like this (document of documents of arrays:
{FA.field1: [valueFA1, valueFA2,..], FA.field2: [value2FA1, value2FA2,..],...}
Because for each "FA" document, I later want to extract some values from fields like Timestamp, negGesEnergie, etc. and do this for every FA instance in the list. I extract them either from a mongoDB or directly from a API (e.g POST request).
Because I want to plot the values later in some chart (written in javascript), where every FA1_instance.x1 value and FA2_instance.x1 value (coming sequentially in the list) represents the numbers on the x-axis, and other fields are the sequence of numbers for the y-values (which are the values of the sequences of FA instances in the list) accordingly.
What would be the easiest data model to extract the values for each FA instance field later on?

Calculating collection stats for a subset of documents in MongoDB

I know the cardinal rule of SE is to not ask a question without giving examples of what you've already tried, but in this case I can't find where to begin. I've looked at the documentation for MongoDB and it looks like there are only two ways to calculate storage usage:
db.collection.stats() returns the statistics about the entire collection. In my case I need to know the amount of storage being used to by a subset of data within a collection (data for a particular user).
Object.bsonsize(<document>) returns the storage size of a single record, which would require a cursor function to calculate the size of each document, one at a time. My only concern with this approach is performance with large amounts of data. If a single user has tens of thousands of documents this process could take too long.
Does anyone know of a way to calculate the aggregate document size of set of records within a collection efficiently and accurately.
Thanks for the help.
This may not be the most efficient or accurate way to do it, but I ended up using a Mongoose plugin to get the size of the JSON representation of the document before it's saved:
module.exports = exports = function defaultPlugin(schema, options){
schema.add({
userId: { type: mongoose.Schema.Types.ObjectId, ref: "User", required: true },
recordSize: Number
});
schema.pre('save', function(next) {
this.recordSize = JSON.stringify(this).length;
next();
});
}
This will convert the schema object to a JSON representation, get it's length, then store the size in the document itself. I understand that this will actually add a tiny bit of extra storage to record the size, but it's the best I could come up with.
Then, to generate a storage report, I'm using a simple aggregate call to get the sum of all of the recordSize values in the collection, filtered by userId:
mongoose.model('YouCollectionName').aggregate([
{
$match: {
userId: userId
}
},
{
$group: {
_id: null,
recordSize: { $sum: '$recordSize'},
recordCount: { $sum: 1 }
}
}
], function (err, results) {
//Do something with your results
});