Meteor Mongo Collections find forEach cursor iteration and saving to ElasticSearch Problem - mongodb

i have Meteor App which is connected to MongoDB.
In mongo i have a table which has ~700k records.
I have a cron job each week, where i read all the records from the table (using Mongo Cursor) and in batches of 10k i want to insert them inside Elastic Search so they are indexed.
let articles = []
Collections.Articles.find({}).forEach(function(doc) {
articles.push({
index: {_index: 'main', _type: 'article', _id: doc.id }
},
doc);
if (0 === articles.length % 10000) {
client.bulk({ maxRetries: 5, index: 'main', type: 'article', body: articles })
data = []
}
})
Since for each is synchronous, goes over each record before it continues, and client.bulk is async, this is overloading the elastic search server and it crashes with Out of Memory Exception.
Is there a way to pause the forEach during the time when the insert is being done? I tried async/await but this does not seem to work as well.
let articles = []
Collections.Articles.find({}).forEach(async function(doc) {
articles.push({
index: {_index: 'main', _type: 'article', _id: doc.id }
},
doc);
if (0 === articles.length % 10000) {
await client.bulk({ maxRetries: 5, index: 'main', type: 'article', body: articles })
data = []
}
})
Any way how to achieve this?
EDIT: I am trying to achieve something like this - if i use promises
let articles = []
Collections.Articles.find({}).forEach(function(doc) {
articles.push({
index: {_index: 'main', _type: 'article', _id: doc.id }
},
doc);
if (0 === articles.length % 10000) {
// Pause FETCHING rows with forEach
client.bulk({ maxRetries: 5, index: 'main', type: 'article', body: articles }).then(() => {
console.log('inserted')
// RESUME FETCHING rows with forEach
console.log("RESUME READING");
})
data = []
}
})

Managed to get this working with ES2018 Async iteration
Got an idea from
Using async/await with a forEach loop
Here is the code that is working
let articles = []
let cursor = Collections.Articles.find({})
for await (doc of cursor) {
articles.push({
index: {_index: 'main', _type: 'article', _id: doc.id }
},
doc);
if (articles.length === 10000) {
await client.bulk({ maxRetries: 5, index: 'trusted', type: 'artikel', body: articles })
articles = []
}
}
This works correctly and it manages to insert all the records into Elastic Search without crashing.

If you are concerned with the unthrottled iteration, then may use the internal Meteor._sleepForMs method, that allows you to put a async timeout in your sync-styled code:
Collections.Articles.find().forEach((doc, index) => {
console.log(index, doc._id)
Meteor._sleepForMs(timeout)
})
Now this works fine within the Meteor environment (Meteor.startup, Meteor.methods, Meteor.publish).
You cron is likely to be not within this environment (= Fiber) so you may write a wrapper that binds the environment:
const bound = fct => Meteor.bindEnvironment(fct)
const iterateSlow = bound(function (timeout) {
Collections.Articles.find().forEach((doc, index) => {
console.log(index, doc._id)
Meteor._sleepForMs(timeout)
})
return true
})
iterateSlow(50) // iterates with 50ms timeout
Here is a complete minimal example, that you can reproduce with a fresh project:
// create a minimal collection
const MyDocs = new Mongo.Collection('myDocs')
// fill the collection
Meteor.startup(() => {
for (let i = 0; i < 100; i++) {
MyDocs.insert({})
}
})
// bind helper
const bound = fct => Meteor.bindEnvironment(fct)
// iterate docs with interval between
const iterateSlow = bound(function (timeout) {
MyDocs.find().forEach((doc, index) => {
console.log(index, doc._id)
Meteor._sleepForMs(timeout)
})
return true
})
// simulate external environment, like when cron runs
setTimeout(() => {
iterateSlow(50)
}, 2000)

Related

Mochawesome with Cypress - how to get aggregated charts at higher level?

I've just started using mochawesome with Cypress (9.7). Our test structure is basically a number of spec files, each following something like the following format:
describe('(A): description of this spec', () => {
describe ('(B): description of test abc', () => {
before(() => {
// do specific set up bits for this test
})
it('(C): runs test abc', () => {
// do actual test stuff
})
})
})
Where within each spec file there would be a single 'A' describe block, but there can be many 'B' level blocks (each with a single 'C') - done this way because the before block for each 'C' is always different - I couldn't use a beforeEach.
When I run my various spec files, each structured similarly to the above, the mochaewsome output is mostly correct - I get a collapsible block for each spec file at level 'A', each with multiple collapsible blocks at level B, each with test info as expected at level C.
But... The circular charts are only displayed at level B. What I was hoping, was that it might be possible to have aggregated charts at level A, and a further aggregated chart for all the level A blocks.
Not sure I've explained this brilliantly(!), but hopefully someone understands, and can offer a suggestion?!
In cypress-mochawesome-reporter there's an alternative setup using on('after:run') which can perform the aggregation.
In Cypress v9.7.0
// cypress/plugins/index.js
const { beforeRunHook, afterRunHook } = require('cypress-mochawesome-reporter/lib');
const { aggregateResults } = require('./aggregate-mochawesome-report-chart');
module.exports = (on, config) => {
on('before:run', async (details) => {
await beforeRunHook(details);
});
on('after:run', async () => {
aggregateResults(config)
await afterRunHook();
});
};
In Cypress v10+
// cypress.config.js
const { defineConfig } = require('cypress');
const { beforeRunHook, afterRunHook } = require('cypress-mochawesome-reporter/lib');
const { aggregateResults } = require('./aggregate-mochawesome-report-chart');
module.exports = defineConfig({
reporter: 'cypress-mochawesome-reporter',
video: false,
retries: 1,
reporterOptions: {
reportDir: 'test-report',
charts: true,
reportPageTitle: 'custom-title',
embeddedScreenshots: true,
inlineAssets: false,
saveAllAttempts: false,
saveJson: true
},
e2e: {
setupNodeEvents(on, config) {
on('before:run', async (details) => {
await beforeRunHook(details);
});
on('after:run', async () => {
aggregateResults(config)
await afterRunHook();
});
},
},
});
The module to do the aggregation is
// aggregate-mochawesome-reporter-chart.js
const path = require('path');
const fs = require('fs-extra')
function aggregateResults(config) {
const jsonPath = path.join(config.reporterOptions.reportDir , '/.jsons', '\mochawesome.json');
const report = fs.readJsonSync(jsonPath)
const topSuite = report.results[0].suites[0]
aggregate(topSuite)
fs.writeJsonSync(jsonPath, report)
}
function aggregate(suite, level = 0) {
const childSuites = suite.suites.map(child => aggregate(child, ++level))
suite.passes = suite.passes.concat(childSuites.map(child => child.passes)).flat()
suite.failures = suite.failures.concat(childSuites.map(child => child.failures)).flat()
suite.pending = suite.pending.concat(childSuites.map(child => child.pending)).flat()
suite.skipped = suite.skipped.concat(childSuites.map(child => child.skipped)).flat()
if (!suite.tests.length && suite.suites[0].tests.length) {
// trigger chart when to describe has no tests
suite.tests = [
{
"title": "Aggregate of tests",
"duration": 20,
"pass": true,
"context": null,
"err": {},
"uuid": "0",
"parentUUID": suite.uuid,
},
]
}
return suite
}
module.exports = {
aggregateResults
}
The function aggregate() recursively loops down through child suites and adds the test results to the parent.
json files
Note the json file is different at the point where afterRunHook runs and at the end of the test run.
If you have the option saveJson: true set, you will get a final json file in the report directory called index.json.
At the afterRunHook stage the file is mochawesome.json.
Before aggregation
After aggregation

MongoDB query with 300k documents takes more than 30 seconds

Ok, as said in title, I have "performance issue" where I need to get all documents from a collection but it takes too long. Players collection contains around 300k documents with small size and query in service goes like this:
async getAllPlayers() {
const players = await this.playersCollection.find({}, {projection: { playerId: 1, name: 1, surname: 1, shirtNumber: 1, position: 1 }}).toArray();
return players;
}
Overall size is 6.4MB. I'm using Fastify adapter, fastify-compress and mongodb native driver. If I remove projection, it takes almost a minute.
Any idea how to improve this?
The best time I get is 8 seconds, where fast-json-stringify give me more than 10 seconds boost over 300k records:
'use strict'
// run fresh mongo
// docker run --name temp --rm -p 27017:27017 mongo
const fastify = require('fastify')({ logger: true })
const fjs = require('fast-json-stringify')
const toString = fjs({
type: 'object',
properties: {
playerId: { type: 'integer' },
name: { type: 'string' },
surname: { type: 'string' },
shirtNumber: { type: 'integer' },
}
})
fastify.register(require('fastify-mongodb'), {
forceClose: true,
url: 'mongodb://localhost/mydb'
})
fastify.get('/', (request, reply) => {
const dataStream = fastify.mongo.db.collection('foo')
.find({}, {
limit: 300000,
projection: { playerId: 1, name: 1, surname: 1, shirtNumber: 1, position: 1 }
})
.stream({
transform(doc) {
return toString(doc) + '\n'
}
})
reply.type('application/jsonl')
reply.send(dataStream)
})
fastify.get('/insert', async (request, reply) => {
const collection = fastify.mongo.db.collection('foo')
const batch = collection.initializeOrderedBulkOp();
for (let i = 0; i < 300000; i++) {
const player = {
playerId: i,
name: `Name ${i}`,
surname: `surname ${i}`,
shirtNumber: i
}
batch.insert(player);
}
const { result } = await batch.execute()
return result
})
fastify.listen(8080)
In any case, you should consider to:
paginate your output
or pushing the data into a bucket (like S3) and return to the client a URL to download the file directly, this will speed up a lot the process and will save your node.js process from this data streaming
Note that the compression in node.js is a heavy process, so it slows it down a lot the response. An nginx proxy adds it by default without the need to implement it in your business logic server.

Variables exporting error in MongoDB error

Question
I have provided my code below for reference. I'm using MongoDB and discord.js v12. So basically, I have made a !info command which shows some general info of the user.
What this code does is, it checks through the member's roles, and regarding which role they have, it calculates their total claim time (for giveaways etc.). The problem here, is with the donator role. I can't figure out why I can't use the donates variable outside the db.findOne block. Here, data.content.length shows the total donates of the users, which means donates * 5 is +5 claim time for each donate.
My Code
const moment = require('moment');
module.exports = {
name: 'info',
async execute(client, message, args, Discord){
const member = message.mentions.members.first() || message.guild.members.cache.get(args[0]) || message.member;
const db = require('../models/d-schema');
db.findOne({ guildid: message.guild.id, user: member.user.id }, async(err, data)=>{
if(err) throw err;
if(data){
const donates = parseInt(data.content.length);
}
})
var DefaultTime = 10;
var support = 0;
var donate = 0;
var boost = 0;
const userRoles = member.roles.cache.map((r) => r.name);
if (userRoles.includes("୨・supporter")) {
support = 3;
}
if (userRoles.includes("୨・donator")) {
donate = donates * 5;
}
if (userRoles.includes("୨・booster")) {
boost = 10;
}
const TotalTime = DefaultTime + support + donate + boost;
const embed = new Discord.MessageEmbed()
.setThumbnail(member.user.displayAvatarURL( {dynamic: true} ))
.addFields(
{name: member.user.tag, value: member.user, inline: true},
{name: 'Nickname', value: `${member.nickname !== null ? member.nickname : 'None'}`, inline: true},
{name: 'Is Bot', value: member.user.bot, inline: true},
{name: 'Joined', value: `${moment.utc(member.joinedAt).format("MMMM Do YYYY")}`, inline: true},
{name: 'Created', value: `${moment.utc(member.user.createdAt).format("MMMM Do YYYY")}`, inline: true},
{name: 'Claim Time', value: `${TotalTime} seconds`, inline: true},
)
.setFooter(`ID : ${member.user.id}`)
.setTimestamp()
.setColor('00ffcc')
message.channel.send(embed)
}
}
You cannot use the donates variable because you are declaring it inside the db.findOne() block. This is called variables scope. For better understanding you can read this answer.
If you want to use it outside of that block, you have to declare it beforehand, like this:
let donates;
db.findOne({ guildid: message.guild.id, user: member.user.id }, async(err, data)=>{
if(err) throw err;
if(data){
donates = parseInt(data.content.length);
}
})
Now you're able to use that variable outside of the db.findOne() block :)
Edit:
Alternative way:
It would be easier to use the function asynchronously. That way, everything can be scoped in the same block!
Example:
These two methods will give the same results:
const data = await Model.findOne({ ... });
console.log(data);
Model.findOne({ ... }, (err, data) => {
console.log(data);
});
Suggestion from Lioness100

Asynchronous Issues with JEST and MongoDB

I am getting inconsistent results with JEST when I try to remove items from a MongoDB Collection using the beforeEach() Hook.
My Mongoose schema and model defined as:
// Define Mongoose wafer sort schema
const waferSchema = new mongoose.Schema({
productType: {
type: String,
required: true,
enum: ['A', 'B'],
},
updated: {
type: Date,
default: Date.now,
index: true,
},
waferId: {
type: String,
required: true,
trim: true,
minlength: 7,
},
sublotId: {
type: String,
required: true,
trim: true,
minlength: 7,
},
}
// Define unique key for the schema
const Wafer = mongoose.model('Wafer', waferSchema);
module.exports.Wafer = Wafer;
My JEST tests:
describe('API: /WT', () => {
// Happy Path for Posting Object
let wtEntry = {};
beforeEach(async () => {
wtEntry = {
productType: 'A',
waferId: 'A01A001.3',
sublotId: 'A01A001.1',
};
await Wafer.deleteMany({});
// I also tried to pass in done and then call done() after the delete
});
describe('GET /:id', () => {
it('Return Wafer Sort Entry with specified ID', async () => {
// Create a new wafer Entry and Save it to the DB
const wafer = new Wafer(wtEntry);
await wafer.save();
const res = await request(apiServer).get(`/WT/${wafer.id}`);
expect(res.status).toBe(200);
expect(res.body).toHaveProperty('productType', 'A');
expect(res.body).toHaveProperty('waferId', 'A01A001.3');
expect(res.body).toHaveProperty('sublotId', 'A01A001.1');
});
}
So the error I always get is related to duplicate keys when I run my tests more than once:
MongoError: E11000 duplicate key error collection: promis_tests.promiswts index: waferId_1_sublotId_1 dup key: { : "A01A001.3", : "A01A001.1" }
But I do not understand how I can get this duplicate key error if the beforeEach() were firing properly. Am I trying to clear the collection improperly? I've tried passing in a done element to the before each callback and invoking it after delete command. I've also tried implementing the delete in beforeAll(), afterEach(), and afterAll() but still get inconsistent results. I'm pretty stumped on this one. I might just removed the schema key all together but I would like to understand what is going on here with the beforeEach(). Thanks in advance for any advice.
It might be because you are not actually using the promise API that mongoose has to offer. By default, mongooses functions like deleteMany() do not return a promise. You will have to call .exec() at the end of the function chain to return a promise e.g. await collection.deleteMany({}).exec(). So you are running into a race condition. deleteMany() also accepts a callback, so you could always wrap it in a promise. I would do something like this:
describe('API: /WT', () => {
// Happy Path for Posting Object
const wtEntry = {
productType: 'A',
waferId: 'A01A001.3',
sublotId: 'A01A001.1',
};
beforeEach(async () => {
await Wafer.deleteMany({}).exec();
});
describe('GET /:id', () => {
it('Return Wafer Sort Entry with specified ID', async () => {
expect.assertions(4);
// Create a new wafer Entry and Save it to the DB
const wafer = await Wafer.create(wtEntry);
const res = await request(apiServer).get(`/WT/${wafer.id}`);
expect(res.status).toBe(200);
expect(res.body).toHaveProperty('productType', 'A');
expect(res.body).toHaveProperty('waferId', 'A01A001.3');
expect(res.body).toHaveProperty('sublotId', 'A01A001.1');
});
}
Also, always expect the assertions with asynchronous code
https://jestjs.io/docs/en/asynchronous.html
You can read more about mongoose promises and query objects here
https://mongoosejs.com/docs/promises.html
Without deleting the schema index this seems to be the most reliable solution. Not 100% sure why it works over async await Wafer.deleteMany({});
beforeEach((done) => {
wtEntry = {
productType: 'A',
waferId: 'A01A001.3',
sublotId: 'A01A001.1',
};
mongoose.connection.collections.promiswts.drop(() => {
// Run the next test!
done();
});
});

Correct way to seed MongoDB with references via mongoose

I have three schemas, one which references two others:
userSchema
{ name: String }
postSchema
{ content: String }
commentSchema
{
content: String,
user: { ObjectID, ref: 'User' },
post: { ObjectID, ref: 'Post' }
}
How can I seed this database in a sane, scalable way? Even using bluebird promises it quickly becomes a nightmare to write.
My attempt so far involves multiple nested promises and is very hard to maintain:
User
.create([{ name: 'alice' }])
.then(() => {
return Post.create([{ content: 'foo' }])
})
.then(() => {
User.find().then(users => {
Post.find().then(posts => {
// `users` isn't even *available* here!
Comment.create({ content: 'bar', user: users[0], post: posts[0] })
})
})
})
This is clearly not the correct way of doing this. What am I missing?
Not sure about bluebird, but the nodejs Promise.all should do the job:
Promise.all([
User.create([{ name: 'alice' }]),
Post.create([{ content: 'foo' }])
]).then(([users, posts]) => {
const comments = [
{ content: 'bar', user: users[0], post: posts[0] }
];
return Comment.create(comments);
})
If you want to seed database with automatically references, use Seedgoose.
This is the easiest seeder for you to use. You don't need to write any program files, but only data files. And Seedgoose handles smart references for you. And by the way, I'm the author and maintainer of this package.
Try this it will work fine:
Note: Node Promise.all will make sure that the both query is executed properly and then return the result in Array:[Users, Posts],
If you get any error during execution of any query, it will be handle by catch block of the Promise.all.
let queryArray = [];
queryArray.push(User.create([{ name: 'alice' }]));
queryArray.push(Post.create([{ content: 'foo' }]));
Promise.all(queryArray).then(([Users, Posts]) => {
const comments = [
{ content: 'bar', user: Users[0], post: posts[0] }
];
return Comment.create(comments);
}).catch(Error => {
console.log("Error: ", Error);
})