Mongo indexing in Meteor - mongodb

Not sure I'm understanding indexing mongo queries in Meteor. Right now, none of my queries are indexed. On some of the pages in the app, there are 15 or 20 links that instigate to a unique mongo query. Would each query be indexed individually?
For example, if one of the queries is something like:
Template.myTemplate.helpers({
...
if (weekly === "1") {
var firstWeekers = _.where(progDocs, {Week1: "1"}),
firstWeekNames = firstWeekers.map(function (doc) {
return doc.FullName;
});
return Demographic.find({ FullName: { $in: firstWeekNames }}, { sort: { FullName: 1 }});
}
...
})
How would I implement each of the indexes?

Firstly minimongo (mongo on the client-side) runs in memory so indexing is much less of a factor than on disk. To minimize network consumption you also generally want to keep your collections on the client fairly small making indexing on the client-side even less important.
On the server however indexing can be critical to good performance. There are two common methods to setup indexes on the server:
via the meteor mongo shell, i.e. db.demographic.createIndex( { FullName: 1 } )
via setting the field to be indexed in your schema when using the Collection2 package. See aldeed:schema-index

Related

Mongodb to fetch top 100 results for each category

I have a collection of transactions that has below schema:
{
_id,
client_id,
billings,
date,
total
}
What I want to achieve is to get the 10 latest transaction models based on the date for a list of client IDs. I don't think the $slice well as the use case is mostly for embedded arrays.
Currently, I am iterating through the client_ids and using find with the limit but it is extremely slow.
UPDATE
Example
https://mongoplayground.net/p/urKH7HOxwqC
This shows two clients with 10 transaction each on different days, I want to write a query that would return latest 5 transaction for each.
Any suggestions of how to query data to make it faster?
The most efficient way would be to just execute multiple queries, 1 for each client, like so:
const clients = await db.collection.distinct('client_id');
const results = await Promise.all(
clients.map((clientId) => db.collection.find({client_id: clientId}).sort({date: -1}).limit(5))
)
To improve this performance make sure you have an index on client_id and date. If for whatever reason you can't built these indexes I'd recommend using this following pipeline (with syntax available starting version 5.3+):
db.collection.aggregate([
{
$group: {
_id: "$client_id",
latestTransactions: {
"$bottomN": {
"n": 5,
"sortBy": {
"date": 1
},
"output": "$$ROOT"
}
}
}
}
])
Mongo Playground

Performance issues related to $nin/$ne querying in large database

I am working on a pipeline where multiple microservices(workers) modify and add attributes to documents. Some of them have to make sure the document was already processed by another microservice and/or make sure they don't process a document twice.
I've already tried two different data structures for this: array and object:
{
...other_attributes
worker_history_array: ["worker_1", "worker_2", ...]
woker_history_object: {"worker_1": true, "worker_2": true, ...}
}
I also created indexes for the 2 fields
{ "worker_history_array": 1 }
{ "worker_history_object.$**": 1 }
Both data structures use the index and work very well when querying for the existence of a worker in the history:
{
"worker_history_array": "worker_1"
}
{
"worker_history_object.worker_1": true
}
But I can't seem to find a query that is fast/ hits the index when looking if a worker did not already process this document. All of those queries perform awfully:
{
"worker_history_array": { $ne: "worker_1" }
}
{
"worker_history_array": { $nin: ["worker_1"] }
}
{
"worker_history_object.worker_1": { $exists: false }
}
{
"worker_history_object.worker_1":{ $not: { $exists: true } }
}
{
"worker_history_object.worker_1": { $ne: true }
}
Performance is already bad with 500k documents, but the database will grow to millions of documents.
Is there a way to improve the query performance?
Can I work around the low selectivity of $ne and $nin?
Different index?
Different data structure?
I don't think it matters but I'm using MongoDB Atlas (MongoDB 4.4.1, cluster with read replicas) on Google Cloud and examined the performance of the queries with MongoDB Compass.
Additional Infos/Restrictions:
Millions of records
Hundreds of workers
I don't know all workers beforehand
Not every worker processes every document (some may only work on documents with type: "x" while others work only on documents with type: "y")
No worker should have knowledge about the pipeline, only about the worker that directly precedes it.
Any help would be greatly appreciated.

Mongo db Collection find returns from the first entry not from last from client side

I am using mongodb and i am querying the database with some conditions which are working fine but the results are coming from the first entry to last where as i want to query from the last added entry to the collection in database
TaggedMessages.find({taggedList:{$elemMatch:{tagName:tagObj.tagValue}}}).fetch()
Meteor uses a custom wrapped version of Mongo.Collection and Mongo.Cursor in order to support reactivity out of the box. It also abstracts the Mongo query API to make it easier to work with.
This is why the native way of accessing elements from the end is not working here.
On the server
In order to use $natural correctly with Meteor you can to use the hint property as option (see the last property in the documentation) on the server:
const selector = {
taggedList:{ $elemMatch:{ tagName:tagObj.tagValue } }
}
const options = {
hint: { $natural : -1 }
}
TaggedMessages.find(selector, options).fetch()
Sidenote: If you ever need to access the "native" Mongo driver, you need to use rawCollection
On the client
On the client you have no real access to the Mongo Driver but to a seemingly similar API (called the minimongo package). There you won't have $natural available (maybe in the future), so you need to use sort with a descenging order:
const selector = {
taggedList:{ $elemMatch:{ tagName:tagObj.tagValue } }
}
const options = {
sort: { createdAt: -1 }
}
TaggedMessages.find(selector, options).fetch()

MongoDB big collection aggregation is slow

I'm having a problem with the time of my mongoDB query, from a node backend using mongoose. i have a collection called people that has 10M records, and every record is queried from the backend and inserted from another part of the system that's written in c++ and needs to be very fast.
this is my mongoose schema:
{
_id: {type: String, index: {unique: true}}, // We generate our own _id! Might it be related to the slowness?
age: { type: Number },
id_num: { type: String },
friends: { type: Object }
}
schema.index({'id_num': 1}, { unique: true, collation: { locale: 'en_US', strength: 2 } })
schema.index({'age': 1})
schema.index({'id_num': 'text'});
Friends is an object looking like that: {"Adam": true, "Eve": true... etc.}.
there's no meaning to the value, and we use dictionaries to avoid duplicates fast on C++.
also, we didn't encounter a set/unique-list type of field in mongoDB.
The Problem:
We display people in a table with pagination. the table has abilities of sort, search, and select number of results.
At first, I queried all people and searched, sorted and paged it on the js. but when there are a lot of documents, It's turning problematic (memory problems).
The next thing i did was to try to fit those manipulations (searching, sorting & paging) on my query.
I used mongo's text search- but it not matches a partial word. is there any way to search a partial insensitive string? (I prefer not to use regex, to avoid unexpected problems)
I have to sort before paging, so I tried to use mongo sort. the problem is, that when the user wants to sort by "Friends", we want to return the people sorted by their number of friends (number of entries in the object).
The only way i succeeded pulling it off was using $addFields in aggregation:
{$addFields: {$size: {$ifNull: [{$objectToArray: '$friends'}, [] ]}}}
this addition is taking forever! when sorting by friends, the query takes about 40s for 8M people, and without this part it takes less than a second.
I used limit and skip for pagination. it works ok, but we have to wait until the user requests the second page and make another very long query.
In the end, this is the the interesting code part:
const { sortBy, sortDesc, search, page, itemsPerPage } = req.query
// Search never matches partial string
const match = search ? {$text: {$search: search}} : {}
const sortByInDB = ['age', 'id_num']
let sort = {$sort : {}}
const aggregate = [{$match: match}]
// if sortBy is on a simple type, we just use mongos sort
// else, we sortBy friends, and add a friends_count field.
if(sortByInDB.includes(sortBy)){
sort.$sort[sortBy] = sortDesc === 'true' ? -1 : 1
} else {
sort.$sort[sortBy+'_count'] = sortDesc === 'true' ? -1 : 1
// The problematic part of the query:
aggregate.push({$addFields: {friends_count: {$size: {
$ifNull: [{$objectToArray: '$friends'},[]]
}}}})
}
const numItems = parseInt(itemsPerPage)
const numPage = parseInt(page)
aggregate.push(sort, {$skip: (numPage - 1)*numItems}, {$limit: numItems})
// Takes a long time (when sorting by "friends")
let users = await User.aggregate(aggregate)
I tried indexing all simple fields, but the time is still too much.
The only other solution i could think of, is making mongo calculate a field "friends_count" every time a document is created or updated- but i have no idea how to do it, without slowing our c++ that writes to the DB.
Do you have any creative idea to help me? I'm lost, and I have to shorten the time drastically.
Thank you!
P.S: some useful information- the C++ area is writing the people to the DB in a bulk once in a while. we can sync once in a while and mostly rely on the data to be true. So, if that gives any of you any idea for a performance boost, i'd love to hear it.
Thanks!

Control Meteor Mongo Query Batch Size

I'm setting up simple pub/sub on a mongo collection in a meteor application.
// On Server
Meteor.publish('records', function (params) {
if (params.gender === "Male") {
params.gender = "M";
} else if (params.gender === "Female") {
params.gender = "F";
}
return Records.find({
gender: params.gender || {$exists: true},
age: {
$lte: params.ageRange.max,
$gte: params.ageRange.min
}
});
});
//On Client
this.computationBlock = Tracker.autorun( () => {
Meteor.subscribe("records",
_.pick(this, ["gender","ageRange"]));
RecordActions.recordsChange(Records.find({}).fetch());
});
These queries occasionally return 10,000+ (total) documents, and the issue is that it returns about 300-500 documents per batch. I have visualizations of aggregated metrics from the query, and so these visualizations often remain in flux for over 10 seconds as each new batch of the query result is brought down from the subscription/publication. I'm pretty sure that if I could configure the batchSize property of the mongo cursor returned from Collection.find(), I could alleviate the problem simply by returning more documents per batch, but I cannot find how to do this in meteor land.
cursor.batchSize is undefined in both the client and server code, although it is simple part of mongo native api (http://docs.mongodb.org/manual/reference/method/cursor.batchSize/). Has anyone had any luck configuring this parameter, or even better, telling meteor to fetch the entire query in "one-shot" as opposed to batching the results?