Faster way to get delta of 2 collections in Mongo - mongodb

I have a complete list of catalog inventory data in Mongo
The basic schema is:
productSku (string)
inventory (number)
This collection consists of approximately 14 million records.
I have another list of actively sold products with a similar schema.
Right now I have it as a json file.
It consists of approximately 23,000 records.
Every 5 hours the 14 million records updates with the latest inventory data.
Once that happens I need to create a CSV of the 23,000 product's latest inventory.
I'm doing it like this:
const inventoryModel = require('../data/inventoryModel');
const activeProducts = require('./activeProducts.json');
const inventoryUpdate = [];
for (const product of activeProducts) {
let latest = await inventoryModel.findOne({ productSku: product.sku }).exec()
latest = latest ? latest._doc : null;
// If there's no current inventory record for the product
if (!lastest) {
// If there was previously an inventory greater than 0
if (product.inventory) {
// Set the latest inventory to you
inventoryUpdate.push({ sku: product.sku, inventory: 0 });
}
} else {
// If there's a change in inventory
if (latest.inventory != product.inventory) {
inventoryUpdate.push({ sku: product.sku, inventory: latest.inventory });
}
}
}
This gives me an array inventoryUpdate that I can use to create a CSV for a mass update. This works fine but it's very slow. It takes about an hour to complete!
I was thinking about maybe adding activeProducts to Mongo as well and if I can somehow keep the execution of the logic within Mongo this would be a lot faster. If possible this is beyond my current understanding and ability.
Anyone have any suggestions?

Related

How to use Redis and MongoDb together

I should build a web application to track users activities and I have some issues to understand how can use Redis for tracking online users activities and Mongo to store that data for analyzing it.
I could use just Mongo but I'm worried about the fact I have a lot of invocations to follow what a user is doing. So I was thinking to write on Redis the online data and put in Mongo when they become old. I mean old when the data is meaningless for being online.
I thought about one gateway between Mongo and Redis, so could be RabbitMQ?.
Any suggestions?
I should use just Mongo?
Just an example of code I wrote:
Front-end (Angular application / Socket.io):
setInterval(function () {
socket.emit('visitor-data', {
referringSite: document.referrer,
browser: navigator.sayswho,
os: navigator.platform,
page: location.pathname
});
}, 3000);
Back-end ( Node.js/ Socket.io)
socket.on('visitor-data', function(data) {
visitorsData[socket.id] = data;
);
VisitorsData is a just in an array, but I should build a scalable application, so I can't store data anymore in this way.
Then I have some functions like this for computing the data:
function computeRefererCounts() {
var referrerCounts = {};
for (var key in visitorsData) {
var referringSite = visitorsData[key].referringSite || '(direct)';
if (referringSite in referrerCounts) {
referrerCounts[referringSite]++;
} else {
referrerCounts[referringSite] = 1;
}
}
return referrerCounts;
}
Just some numbers:
I estimated something like :
1 million users for day
15 million activities for day.

Meteor MongoDB pub/sub memory error

I'm doing data visualizations using Meteor, running React and D3 for the view. Today decided to populate the MongoDB server with more documents (a total of 30 documents, netting ~50k lines each). There was no issue before running the database with 'only' 4 documents, but now I'm seeing
Exception while polling query {"collectionName":"Metrics","selector":{},"options":{"transform":null}}: MongoError: Query exceeded the maximum allowed memory usage of 40 MB. Please consider adding more filters to reduce the query response size.
This is my collections.js file, because autopublish is off.
if (Meteor.isServer) {
const remoteCollectionString = "notForHumanConsumption";
database = new MongoInternals.RemoteCollectionDriver(remoteCollectionString);
Meteor.publish('metricsDB', function() {
return Metrics.find({})
});
}
Metrics = new Mongo.Collection("Metrics", { _driver: database });
if (Meteor.isClient) {
Meteor.startup(function() {
Session.set('data_loaded', false);
console.log(Session.get('data_loaded'))
});
Meteor.subscribe('metricsDB', function(){
// Set the reactive session as true to indicate that the data have been loaded
Session.set('data_loaded', true);
console.log(Session.get('data_loaded'))
});
}
Adding any type of sort in the publish function seems to at least get the console to log true, but it does so pretty much instantly. The terminal shows no error, but I'm stuck and not getting any data to the App.
Update: decided to remove 10 entries from the collection, limiting it to 20, and now the collection is ~28MB large, instead of ~42MB for the 30 items. The App is loading, albeit slowly.
The data structure looks pretty much like this:
{
_id: BLAKSBFLUyiy79a6fs9Pjhadkh&SA86886Daksh,
DateQueried: dateString,
DataSet1: [
{
ID,
Name,
Properties: [
{
ID,
Name,
Nr,
Status: [0, 1, 2, 3],
LocationData: {
City,
CountryCode,
PostalCode,
Street,
StreetNumber
}
}
]
}
],
DataSet2: [
{
ID,
Name,
Nr,
Obj: {
NrR,
NrGS,
LengthOfR
}
}
]
}
In each document DataSet1 is usually 9 items long. Properties in there can contain up to ~1900 (they average around 500) items. DataSet2 is normally around 49 items.

MongoDB Collections & Data Structure

I have just started using MongoDB so I apologize if this is an obvious or simple question.
I am trying to store my information using the following data structure:
database: {
Customers: {
Date_added_1: {
{customer 1 info},
{customer 2 info},
{customer 3 info}
}
Date_added_2: {
{customer 1 info},
{customer 2 info}
}
}
Employees: {
Date_hired_1: {
{employee 1 info},
{employee 2 info}
}
Date_hired_2: {
{employee 1 info}
}
}
}
The code I have written to input the information into the database looks like this:
from pymongo import MongoClient
def addLeadsToDatabase(personCategory, personInformation, date):
client = MongoClient('localhost', port#)
db = client.database[personCategory][date]
db.insert({personInformation})
person_to_add = {'Name':'John Smith', 'Phone':'888-888-8888', 'Email':'example#email.com'}
addLeadsToDatabase('Customers', person_to_add, '06/28/2017')
However when navigating through the database it looks like each [personCategory][date] is getting saved as a separate collection. Rather than storing the data first within the personCategory collection, and then within the date sub-collection.
Therefore when I run 'show collections' in the MongoDB shell it outputs:
Customers.6/25/2017
Employees.6/25/2017
Customers.6/26/2017
Customers.6/27/2017
Employees.6/27/2017
Rather than just:
Customers
Employees
With the date category stored within each.
Is there a way to store the data the way I have described so it is not making a new collection each time I run the code and just storing data within the appropriate collection(s)?
Please consider:
from pymongo import MongoClient
def addLeadsToDatabase(personCategory, personInformation):
client = MongoClient('localhost', port#)
db = client.database[personCategory]
db.insert({personInformation})
person_to_add = {'Name':'John Smith', 'Phone':'888-888-8888', 'Email':'example#email.com', 'Date': '06/28/2017' } //I would suggest you to use new Date()
addLeadsToDatabase('Customers', person_to_add)
Summary:
Keep it simple. Later on you can filter by date or even aggregate your query by this date.

Long running Mongo queries in Meteor

How would one go about updating 1000s of documents in a collection in meteor where forEach has to be used to first calculate the changes for each individual document?
There's a timeout of 10 minutes or so as well as a certain number of megabytes. What I've done in the past is split the updates into groups of 300 and update like that. But is there a simpler way to do it in meteor to allow the for each loop to run for an hour of needed?
Using percolate:synced-cron you could easily do this in batches.
SyncedCron.add({
name: 'Update mass quantities',
schedule: function(parser) {
// parser is a later.parse object
return parser.text('every 1 minute'); // or at any interval you wish
},
job: function() {
var query = { notYetProcessed: true }; // or whatever your criteria are
var batchSize = { limit: 300 }; // for example
myCollection.find(query,batchSize).forEach(doc){
var update = { $set: { notYetProcessed: false }}; // along with everything else you want to update
myCollection.update(doc._id,update);
}
}
});
This will run every minute until there are no more records to be processed. It will continue running after that of course but won't find anything to update.

Auto increment in MongoDB to store sequence of Unique User ID

I am making a analytics system, the API call would provide a Unique User ID, but it's not in sequence and too sparse.
I need to give each Unique User ID an auto increment id to mark a analytics datapoint in a bitarray/bitset. So the first user encounters would corresponding to the first bit of the bitarray, second user would be the second bit in the bitarray, etc.
So is there a solid and fast way to generate incremental Unique User IDs in MongoDB?
As selected answer says you can use findAndModify to generate sequential IDs.
But I strongly disagree with opinion that you should not do that. It all depends on your business needs. Having 12-byte ID may be very resource consuming and cause significant scalability issues in future.
I have detailed answer here.
You can, but you should not
https://web.archive.org/web/20151009224806/http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Each object in mongo already has an id, and they are sortable in insertion order. What is wrong with getting collection of user objects, iterating over it and use this as incremented ID? Er go for kind of map-reduce job entirely
I know this is an old question, but I shall post my answer for posterity...
It depends on the system that you are building and the particular business rules in place.
I am building a moderate to large scale CRM in MongoDb, C# (Backend API), and Angular (Frontend web app) and found ObjectId utterly terrible for use in Angular Routing for selecting particular entities. Same with API Controller routing.
The suggestion above worked perfectly for my project.
db.contacts.insert({
"id":db.contacts.find().Count()+1,
"name":"John Doe",
"emails":[
"john#doe.com",
"john.doe#business.com"
],
"phone":"555111322",
"status":"Active"
});
The reason it is perfect for my case, but not all cases is that as the above comment states, if you delete 3 records from the collection, you will get collisions.
My business rules state that due to our in house SLA's, we are not allowed to delete correspondence data or clients records for longer than the potential lifespan of the application I'm writing, and therefor, I simply mark records with an enum "Status" which is either "Active" or "Deleted". You can delete something from the UI, and it will say "Contact has been deleted" but all the application has done is change the status of the contact to "Deleted" and when the app calls the respository for a list of contacts, I filter out deleted records before pushing the data to the client app.
Therefore, db.collection.find().count() + 1 is a perfect solution for me...
It won't work for everyone, but if you will not be deleting data, it works fine.
Edit
latest versions of pymongo:
db.contacts.count() + 1
First Record should be add
"_id" = 1 in your db
$database = "demo";
$collections ="democollaction";
echo getnextid($database,$collections);
function getnextid($database,$collections){
$m = new MongoClient();
$db = $m->selectDB($database);
$cursor = $collection->find()->sort(array("_id" => -1))->limit(1);
$array = iterator_to_array($cursor);
foreach($array as $value){
return $value["_id"] + 1;
}
}
I had a similar issue, namely I was interested in generating unique numbers, which can be used as identifiers, but doesn't have to. I came up with the following solution. First to initialize the collection:
fun create(mongo: MongoTemplate) {
mongo.db.getCollection("sequence")
.insertOne(Document(mapOf("_id" to "globalCounter", "sequenceValue" to 0L)))
}
An then a service that return unique (and ascending) numbers:
#Service
class IdCounter(val mongoTemplate: MongoTemplate) {
companion object {
const val collection = "sequence"
}
private val idField = "_id"
private val idValue = "globalCounter"
private val sequence = "sequenceValue"
fun nextValue(): Long {
val filter = Document(mapOf(idField to idValue))
val update = Document("\$inc", Document(mapOf(sequence to 1)))
val updated: Document = mongoTemplate.db.getCollection(collection).findOneAndUpdate(filter, update)!!
return updated[sequence] as Long
}
}
I believe that id doesn't have the weaknesses related to concurrent environment that some of the other solutions may suffer from.
// await collection.insertOne({ autoIncrementId: 1 });
const { value: { autoIncrementId } } = await collection.findOneAndUpdate(
{ autoIncrementId: { $exists: true } },
{
$inc: { autoIncrementId: 1 },
},
);
return collection.insertOne({ id: autoIncrementId, ...data });
I used something like nested queries in MySQL to simulate auto increment, which worked for me. To get the latest id and increment one to it you can use:
lastContact = db.contacts.find().sort({$natural:-1}).limit(1)[0];
db.contacts.insert({
"id":lastContact ?lastContact ["id"] + 1 : 1,
"name":"John Doe",
"emails": ["john#doe.com", "john.doe#business.com"],
"phone":"555111322",
"status":"Active"
})
It solves the removal issue of Alex's answer. So no duplicate id will appear if any record is removed.
More explanation: I just get the id of the latest inserted document, add one to it, and then set it as the id of the new record. And ternary is for cases that we don't have any records yet or all of the records are removed.
this could be another approach
const mongoose = require("mongoose");
const contractSchema = mongoose.Schema(
{
account: {
type: mongoose.Schema.Types.ObjectId,
required: true,
},
idContract: {
type: Number,
default: 0,
},
},
{ timestamps: true }
);
contractSchema.pre("save", function (next) {
var docs = this;
mongoose
.model("contract", contractSchema)
.countDocuments({ account: docs.account }, function (error, counter) {
if (error) return next(error);
docs.idContract = counter + 1;
next();
});
});
module.exports = mongoose.model("contract", contractSchema);
// First check the table length
const data = await table.find()
if(data.length === 0){
const id = 1
// then post your query along with your id
}
else{
// find last item and then its id
const length = data.length
const lastItem = data[length-1]
const lastItemId = lastItem.id // or { id } = lastItem
const id = lastItemId + 1
// now apply new id to your new item
// even if you delete any item from middle also this work
}