Meteor MongoDB pub/sub memory error - mongodb

I'm doing data visualizations using Meteor, running React and D3 for the view. Today decided to populate the MongoDB server with more documents (a total of 30 documents, netting ~50k lines each). There was no issue before running the database with 'only' 4 documents, but now I'm seeing
Exception while polling query {"collectionName":"Metrics","selector":{},"options":{"transform":null}}: MongoError: Query exceeded the maximum allowed memory usage of 40 MB. Please consider adding more filters to reduce the query response size.
This is my collections.js file, because autopublish is off.
if (Meteor.isServer) {
const remoteCollectionString = "notForHumanConsumption";
database = new MongoInternals.RemoteCollectionDriver(remoteCollectionString);
Meteor.publish('metricsDB', function() {
return Metrics.find({})
});
}
Metrics = new Mongo.Collection("Metrics", { _driver: database });
if (Meteor.isClient) {
Meteor.startup(function() {
Session.set('data_loaded', false);
console.log(Session.get('data_loaded'))
});
Meteor.subscribe('metricsDB', function(){
// Set the reactive session as true to indicate that the data have been loaded
Session.set('data_loaded', true);
console.log(Session.get('data_loaded'))
});
}
Adding any type of sort in the publish function seems to at least get the console to log true, but it does so pretty much instantly. The terminal shows no error, but I'm stuck and not getting any data to the App.
Update: decided to remove 10 entries from the collection, limiting it to 20, and now the collection is ~28MB large, instead of ~42MB for the 30 items. The App is loading, albeit slowly.
The data structure looks pretty much like this:
{
_id: BLAKSBFLUyiy79a6fs9Pjhadkh&SA86886Daksh,
DateQueried: dateString,
DataSet1: [
{
ID,
Name,
Properties: [
{
ID,
Name,
Nr,
Status: [0, 1, 2, 3],
LocationData: {
City,
CountryCode,
PostalCode,
Street,
StreetNumber
}
}
]
}
],
DataSet2: [
{
ID,
Name,
Nr,
Obj: {
NrR,
NrGS,
LengthOfR
}
}
]
}
In each document DataSet1 is usually 9 items long. Properties in there can contain up to ~1900 (they average around 500) items. DataSet2 is normally around 49 items.

Related

Faster way to get delta of 2 collections in Mongo

I have a complete list of catalog inventory data in Mongo
The basic schema is:
productSku (string)
inventory (number)
This collection consists of approximately 14 million records.
I have another list of actively sold products with a similar schema.
Right now I have it as a json file.
It consists of approximately 23,000 records.
Every 5 hours the 14 million records updates with the latest inventory data.
Once that happens I need to create a CSV of the 23,000 product's latest inventory.
I'm doing it like this:
const inventoryModel = require('../data/inventoryModel');
const activeProducts = require('./activeProducts.json');
const inventoryUpdate = [];
for (const product of activeProducts) {
let latest = await inventoryModel.findOne({ productSku: product.sku }).exec()
latest = latest ? latest._doc : null;
// If there's no current inventory record for the product
if (!lastest) {
// If there was previously an inventory greater than 0
if (product.inventory) {
// Set the latest inventory to you
inventoryUpdate.push({ sku: product.sku, inventory: 0 });
}
} else {
// If there's a change in inventory
if (latest.inventory != product.inventory) {
inventoryUpdate.push({ sku: product.sku, inventory: latest.inventory });
}
}
}
This gives me an array inventoryUpdate that I can use to create a CSV for a mass update. This works fine but it's very slow. It takes about an hour to complete!
I was thinking about maybe adding activeProducts to Mongo as well and if I can somehow keep the execution of the logic within Mongo this would be a lot faster. If possible this is beyond my current understanding and ability.
Anyone have any suggestions?

mongodb Alerts for frequent queries

I have this query that inserts when a listener is listening to a song.
const nowplayingData = {"type":"S","station": req.params.stationname, "song": data[1], "artist": data[0], "timeplay":npdate};
LNowPlaying.findOneAndUpdate(
nowplayingData,
{ $addToSet: { history: [uuid] } }, { upsert: true }, function(err) {
if (err) {
console.log('ERROR when submitting round');
console.log(err);
}
});
I have been getting the following emails for the last week - they are starting to get annoying.
Mongodb Alerts
These alerts don't show anything wrong with the query or the code.
I also have the following query that checks for the latest userID matching the station name.
I believe this is the query setting off the alerts - because of the amount of times we request the same query over and over (runs every 10 seconds and may have unto 1000 people requesting the info at the same time.)
var query = LNowPlaying.findOne({"station":req.params.stationname, "history":{$in: [y]}}).sort({"_id":-1})
query.exec(function (err, docs) {
/*res.status(200).json({
data: docs
});*/
console.error(docs)
if(err){
console.error("error")
res.status(200).json(
err
);
}
I am wondering how can I make this better so that I don't get the alerts - I know I either have to make an index works which I believe needs to be station name and history array.
I have tried to create a new index using the fields station and history But got this error
Index build failed: 6ed6d3f5-bd61-4d70-b8ea-c62d7a10d3ba: Collection AdStitchr.NowPlaying ( 8190d374-8f26-4c31-bcf7-de4d11803385 ) :: caused by :: Field 'history' of text index contains an array in document: { _id: ObjectId('5f102ab25b43e19dabb201f5'), artist: "Cobra Dukes", song: "Leave The Light On (Hook N Sling Remix) [nG]", station: "DRN1", timeplay: new Date(1594898580000), __v: 0, history: [ "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE1OTQ5ODE0MjQsImlhdCI6MTU5NDg5NTAyNCwib2lkIjoicmFkaW9tZWRpYSJ9.ECVxBzAYZcpyueBP_Xlyncn41OgrezrOF8Dn3CdAnOU" ] }
Can you not index an Array?
How I am trying to create the index.
my index creation

Inconsistent results with Meteor's pub/sub feature

I'm experiencing inconsistent results with Meteor's pub/sub feature, and I suspect it's a source of confusion for a lot of developers hitting the threshold of an MVP built in Meteor becoming a production app.
Maybe this is a limitation of MergeBox:
Let's say I have a collection called Events, in which I have document-oriented structures, ie, nested Array, Objects. An Events document might look like so:
// an Events document //
{
_id: 'abc',
name: 'Some Event',
participation: {
'userOneId': {
games: {
'gameOneId': {
score: 100,
bonus: 10
}
},
{
'gameTwoId': : {
score: 100,
bonus: 10
}
}
}
},
'userTwoId': {
games: {
'gameOneId': {
score: 70,
bonus: 15
}
},
contests: {
'contestOneId': [2, 3, 6, 7, 4],
'contestTwoId': [9, 3, 7, 2, 1],
}
}
},
}
}
So at these events, users can optionally participate in games of certain types and contests of certain types.
Now, I want to restrict subscriptions to the Events collection based on the user (show only this user's participation), and, sometimes I'm only interested in changes to one subset of the data (like, show only the user's scores on 'gameOneId').
So I've created a publication like so:
Meteor.publish("events.participant", function(eventId, userId) {
if(!Meteor.users.findOne({_id: this.userId})) return this.ready();
check(eventId, String);
check(userId, String);
const includeFields = {
name: 1,
[`participation.${userId}`]: 1
};
return Events.find({_id: eventId}, {fields: includeFields});
});
This publication seems to work fine on the client if I do:
// in onCreated of events template //
template.autorun(function() {
const
eventId = FlowRouter.getParam('id'),
userId = Meteor.userId(),
subscription = template.subscribe('events.participant', eventId, userId);
if (subscription.ready()) {
const event = Events.findOne({_id: eventId}, parseIncludeFields(['name', `participation.${userId}`]));
template.props.event.set(event);
}
});
Happily, I can use the Event document returned that includes only the name field and all of the user's participation data.
But, later, in another template if I do:
// in onCreated of games template //
template.autorun(function() {
const
eventId = FlowRouter.getParam('id')
gameId = FlowRouter.getParam('gameId'),
userId = Meteor.userId(),
subscription = template.subscribe('events.participant', eventId, userId);
if(subscription.ready()) {
const event = Events.findOne({_id: eventId}, {fields: {[`participation.${userId}.games.${gameId}`]: 1}});
template.props.event.set(event);
}
});
I sometimes get back the data at event.participation[userId].games[gameId], and sometimes I don't - the Object that's suppose to be at gameId is non-existent, even the it exists in the Mongo document, and the subscription should include it. Why?
The only difference is between the two calls to Events.findOne() is that in the latter, I'm not requesting the name field. But, if this is a problem, why?. If minimongo already has the document, who cares if I request parts of it?
The subscriptions in both templates are identical - I'm doing this because the games template is available at a route, so the user could go straight to the games url, by-passing the events template altogether, so I want to be sure the client has the document it needs to render correctly.
The only way I've gotten around this is to make a straight Meteor method call to the server in the games template to fetch the subset of interest, but this seems like a cop-out.
If you've read this far, you're a champ!

How do I publish two random items from a Meteor collection?

I'm making an app where two random things from a collection are displayed to the user. Every time the user refreshes the page or clicks on a button, she would get another random pair of items.
For example, if the collection were of fruits, I'd want something like this:
apple vs banana
peach vs pineapple
banana vs peach
The code below is for the server side and it works except for the fact that the random pair is generated only once. The pair doesn't update until the server is restarted. I understand it is because generate_pair() is only called once. I have tried calling generate_pair() from one of the Meteor.publish functions but it only sometimes works. Other times, I get no items (errors) or only one item.
I don't mind publishing the entire collection and selecting random items from the client side. I just don't want to crash the browser if Items has 30,000 entries.
So to conclude, does anyone have any ideas of how to get two random items from a collection appearing on the client side?
var first_item, second_item;
// This is the best way I could find to get a random item from a Meteor collection
// Every item in Items has a 'random_number' field with a randomly generated number between 0 and 1
var random_item = function() {
return Items.find({
random_number: {
$gt: Math.random()
}
}, {
limit: 1
});
};
// Generates a pair of items and ensure that they're not duplicates.
var generate_pair = function() {
first_item = random_item();
second_item = random_item();
// Regenerate second item if it is a duplicate
while (first_item.fetch()[0]._id === second_item.fetch()[0]._id) {
second_item = random_item();
}
};
generate_pair();
Meteor.publish('first_item', function() {
return first_item;
});
// Is this good Meteor style to have two publications doing essentially the same thing?
Meteor.publish('second_item', function() {
return second_item;
});
The problem with your approach is that subscribing to the same publication with the same arguments (no arguments in this case) over and over in the client will only get you subscribed only once to the server-side logic, this is because Meteor is optimizing its internal Pub/Sub mechanism.
To truly discard the previous subscription and get the server-side publish code to re-execute and send two new random documents, you need to introduce a useless random argument to your publication, your client-side code will subscribe over and over to the publication with a random number and each time you'll get unsubscribed and resubscribed to new random documents.
Here is a full implementation of this pattern :
server/server.js
function randomItemId(){
// get the total items count of the collection
var itemsCount = Items.find().count();
// get a random number (N) between [0 , itemsCount - 1]
var random = Math.floor(Random.fraction() * itemsCount);
// choose a random item by skipping N items
var item = Items.findOne({},{
skip: random
});
return item && item._id;
}
function generateItemIdPair(){
// return an array of 2 random items ids
var result = [
randomItemId(),
randomItemId()
];
//
while(result[0] == result[1]){
result[1] = randomItemId();
}
//
return result;
}
Meteor.publish("randomItems",function(random){
var pair = generateItemIdPair();
// publish the 2 items whose ids are in the random pair
return Items.find({
_id: {
$in: pair
}
});
});
client/client.js
// every 5 seconds subscribe to 2 new random items
Meteor.setInterval(function(){
Meteor.subscribe("randomItems", Random.fraction(), function(){
console.log("fetched these random items :", Items.find().fetch());
});
}, 5000);
You'll need to meteor add random for this code to work.
Meteor.publish 'randomDocs', ->
ids = _(Docs.find().fetch()).pluck '_id'
randomIds = _(ids).sample 2
Docs.find _id: $in: randomIds
Here's another approach, uses the excellent publishComposite package to populate matches in a local (client-only) collection so it doesn't conflict with other uses of the main collection:
if (Meteor.isClient) {
randomDocs = new Mongo.Collection('randomDocs');
}
if (Meteor.isServer) {
Meteor.publishComposite("randomDocs",function(select_count) {
return {
collectionName:"randomDocs",
find: function() {
let self=this;
_.sample(baseCollection.find({}).fetch(),select_count).forEach(function(doc) {
self.added("randomDocs",doc._id,doc);
},self);
self.ready();
}
}
});
}
in onCreated: this.subscribe("randomDocs",3);
(then in a helper): return randomDocs.find({},{$limit:3});

Sequentially numbering MongoDB subdocuments

I have a MongoDB collection of "departments" and each department has a sub-collection of "tickets". In the old MySQL-based system, we assigned a number to each ticket (per department) by counting existing records and adding one.
This gives each department a human-readable numbering system to identify tickets: 1, 2, 3, etc.
However, in Mongo, auto increment fields like this aren't used and I'm finding that because operations may be async (using Mongoose in a NodeJS app), counting existing records may not be trustworthy (same with incrementing a counter on the department collection)
I've spent time looking for a solution but am finding it difficult to sort out unrelated topics.
Is there any trusted way to make a sequential number system that relies on a custom query?
Here's an example of the models/save code I'm toying with
var OrganizationSchema = new Schema({
name: {
type: String,
required: true
},
departments: [Department.schema]
});
var DepartmentSchema = new Schema({
name: {
type: String,
required: true
}
});
// Tickets aren't stored as subdocuments in Departments because there could be
// a lot, I didn't want it to affect performance
var TicketSchema = new Schema({
project: {
type: [{ type: Schema.Types.ObjectId, ref: 'Department' }],
required: true
},
summary: {
type: String,
required: true
}
});
// in-progress save code
Organization.findOne({ 'departments._id': department }, function(err, org){
var lastTicketId = org.departments[0].lastTicketId;
console.log(lastTicketId);
var ticket = new Ticket({
department: department,
summary: req.body.summary
});
ticket.save(function(err, result) {
if (err) {
return next(err);
}
Organization.findOneAndUpdate(
{ 'departments._id': department },
{ $inc: { 'departments.$.lastTicketId': 1 } },
function(err, result) {
console.log('saving ' + result.departments[0].lastTicketId);
}
);
});
});
This save code is an API endpoint, so I'm bulk-testing 20-some API requests from a for loop. That means they're coming in very fast and I can really see the async effect on the numbering.
The console.log output is:
loading 0
loading 0
saving 0
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
saving 1
saving 2
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
loading 1
saving 3
saving 4
saving 9
saving 5
saving 10
saving 6
saving 11
saving 7
saving 12
saving 17
saving 8
saving 13
saving 18
saving 23
saving 14
saving 19
saving 24
saving 15
saving 20
saving 25
saving 16
saving 21
saving 22
there are 2 possible techniques with mongoDB
a counters collection
optimistic loop
see mongoDB article here:
something that is not mentioned in this article is that counter's collection guarantees a unique incremental id in a multiprocess environment but if this id is used for insertions insertion order is not 100% guaranteed to correspond to this id.
If insertion order is critical use the Optimistic Loop technique
If you're using Mongoose, you could try auto-increment: https://www.npmjs.com/package/mongoose-auto-increment
var OrganizationSchema = new mongoose.Schema({
...
});
var DepartmentSchema = new mongoose.Schema({
...
});
var TicketSchema = new Schema({
ticketNumber: { type: Number, required: true },
project: {
type: [{ type: Schema.Types.ObjectId, ref: 'Department' }],
required: true}],
...
});
TicketSchema.plugin(autoIncrement.plugin, {model: 'Ticket', field: 'ticketNumber'} );
This will auto increment the ticketNumber each time you insert a ticket document. It's not incremented per department though - just a unique number per ticket.
The latest ticket number is stored in a special document collection in MongoDB mongoose-auto-increments so no problems with picking the next number even if you're in a web farmed multi server situation.
Be careful if inserting Ticket documents manually though. You would need to manually update the mongoose-auto-increments collection too.
In the end, the question was a little incorrect because I had assumed mongo was the issue. However, in reviewing the logs I posted, I realized that eventually the numbers were being incremented correctly, so my problem had to be a javascript issue.
I figured that querying the last ID, then saving the ticket, then updating the last ID was too inefficient, but must be allowing room for async queries to overlap.
I decided to rewrite the save logic to validate the ticket without saving it, then increment the last ID in a single query, and then save the issue with the proper value.
So far it appears to work perfectly.
// Ensure validation passes before we increment an ID
// so we avoid disconnected IDs if it fails later
ticket.validate(function(err) {
if (err) {
return next(err);
}
// Find and update last ID in one transaction
// due to async issues
Organization.findOneAndUpdate(
{ 'departments._id': department },
{ $inc: { 'departments.$.lastTicketId': 1 } },
function(err, result) {
if (err) {
return next(err);
}
ticket.displayId = result.departments[0].lastTicketId;
ticket.save(function(err, user) {
if (err) {
return next(err);
}
// success
});
}
);
});