Mongo - new vs processed approach - mongodb

I am new to Mongo and have gotten close to where I want to be after 3 days of banging my head against the keyboard, but now I think I may just be misunderstanding certain key concepts:
What I am trying to do:
I have a node script that is pulling in feed items from various sources very frequently and storing them (title, link, origin, processed:false)
I have another script pulling out records at random, one at a time, using them, and updating processed:true
End Goal: Items should be unique by title - if it's been seen before it should not be written to DB, and once it's been processed one time, it should never be processed again.
INSERT SCRIPT:
key = {'title':title};
data = {'origin':origin, 'title':title, 'original_link':original_url, 'processed':false};
collection.update(key, data, {upsert:true}, function(err, doc) { ...
READ SCRIPT:
collection.findOne({processed:false}, function(err, doc){
if (err) throw err;
logger.info("Read out the following item from mongodb:...");
console.dir(doc);
thisId = doc._id;
markProcessed(thisId);
}
var markProcessed = function(id) {
collection.update({ _id:id },
{
$set: {'processed':true},
}, function(err, doc){
if (err) throw err;
logger.info("Marked record:"+id+" as processed");
console.dir(doc);
}
)
};
I've tried using collection.ensureIndex({'title':1}, {unique:true}) to no success either.
As the two scripts run in parallel the read script ends up repeating work on already processed records, and although the markProcessed function was working all yesterday it miraculously does not today :)
I would very much appreciate any guidance.

There is a problem with your insert script. When you use collection.update and you already have a document with the same key in the database, that document will be overwritten with the new one. An unique index doesn't prevent this, because there aren't two documents with the same title in the collection at the same time.
When you don't want to overwrite an existing record, use collection.insert which will fail when the inserted document violates an unique index.

Related

Meteor: Increment DB value server side when client views page

I'm trying to do something seemingly simple, update a views counter in MongoDB every time the value is fetched.
For example I've tried it with this method.
Meteor.methods({
'messages.get'(messageId) {
check(messageId, String);
if (Meteor.isServer) {
var message = Messages.findOne(
{_id: messageId}
);
var views = message.views;
// Increment views value
Messages.update(
messageId,
{ $set: { views: views++ }}
);
}
return Messages.findOne(
{_id: messageId}
);
},
});
But I can't get it to work the way I intend. For example the if(Meteor.isServer) code is useless because it's not actually executed on the server.
Also the value doesn't seem to be available after findOne is called, so it's likely async but findOne has no callback feature.
I don't want clients to control this part, which is why I'm trying to do it server side, but it needs to execute everytime the client fetches the value. Which sounds hard since the client has subscribed to the data already.
Edit: This is the updated method after reading the answers here.
'messages.get'(messageId) {
check(messageId, String);
Messages.update(
messageId,
{ $inc: { views: 1 }}
);
return Messages.findOne(
{_id: messageId}
);
},
For example the if(Meteor.isServer) code is useless because it's not
actually executed on the server.
Meteor methods are always executed on the server. You can call them from the client (with callback) but the execution happens server side.
Also the value doesn't seem to be available after findOne is called,
so it's likely async but findOne has no callback feature.
You don't need to call it twice. See the code below:
Meteor.methods({
'messages.get'(messageId) {
check(messageId, String);
var message = Messages.findOne({_id:messageId});
if (message) {
// Increment views value on current doc
message.views++;
// Update by current doc
Messages.update(messageId,{ $set: { views: message.views }});
}
// return current doc or null if not found
return message;
},
});
You can call that by your client like:
Meteor.call('messages.get', 'myMessageId01234', function(err, res) {
if (err || !res) {
// handle err, if res is empty, there is no message found
}
console.log(res); // your message
});
Two additions here:
You may split messages and views into separate collections for sake of scalability and encapsulation of data. If your publication method does not restrict to public fields, then the client, who asks for messages also receives the view count. This may work for now but may violate on a larger scale some (future upcoming) access rules.
views++ means:
Use the current value of views, i.e. build the modifier with the current (unmodified) value.
Increment the value of views, which is no longer useful in your case because you do not use that variable for anything else.
Avoid these increment operator if you are not clear how they exactly work.
Why not just using a mongo $inc operator that could avoid having to retrieve the previous value?

Mongoose No matching document found using id() method. Error caused by asynchronous delete requests

Making asynchronous requests in a loop to delete documents from an embedded collection:
_.each deletedItem, (item) ->
item.$delete()
Erratically throws this error:
{ message: 'No matching document found.', name: 'VersionError' }
When executing:
var resume = account.resumes.id(id);
resume.remove();
account.save(function (err, acct) {
console.log(err);
if(err) return next(err);
res.send(resume);
});
After logging account.resumes and looking through the _id's of all of the resumes, the document I am attempting to find by id, exists in the collection.
530e57a7503d421eb8daca65
FIND:
{ title: 'gggff', _id: 530e57a7503d421eb8daca65 }
IN:
[{ title: 'asddas', _id: 530e57a7503d421eb8daca61 }
{ title: 'gggff', _id: 530e57a7503d421eb8daca65 }
{ title: 'ewrs', _id: 530e57a7503d421eb8daca64 }]
I assume this has to do with the fact that I am performing these requests asynchronously, or that there is a versioning issue, but I have no idea how to resolve it.
It doesn't make any sense to me how when I log the resumes, I can see the resume I attempt to find, yet if I log:
log(account.resumes.id(id));
I get undefined.
UPDATE
I've discovered that my issue is with versioning.
http://aaronheckmann.blogspot.com/2012/06/mongoose-v3-part-1-versioning.html
But I am still unsure how to resolve it without disabling versioning, which I don't want to do.
In mongodb version 3, documents now have an increment() method which manually forces incrementation of the document version. This is also used internally whenever an operation on an array potentially alters array element position. These operations are:
$pull $pullAll $pop $set of an entire array
changing the version key
The version key is customizable by passing the versionKey option to the Schema constructor:
new Schema({ .. }, { versionKey: 'myVersionKey' });
Or by setting the option directly:
schema.set('versionKey', 'myVersionKey');
disabling
If you don’t want to use versioning in your schema you can disable it by passing false for the versionKey option.
schema.set('versionKey', false);
MongooseJs API docs explicitly warn on disabling versioning, and recommend against it. Your issue is due to workflow. If you're updating your collection from the UI, sending the API request and not refreshing your object with the object from the backend -- then attempt to update it again, you'll encounter the error you are reporting. I suggest either consuming/updating the object scope from the API response, then __v is correctly incremented. Or don't send the __v field on the PUT API request, this way it won't conflict with version on the collection in the database.
Another option is -- when requesting the object from the backend, have the API response not send the __v field, this way you don't have to code logic to NOT send it from the frontend. On all your gets for that collection, do either one of the following (depends how you write your queries):
var model = require('model');
var qry = model.find();
qry.select('-__v');
qry.exec(function(err,results){
if(err) res.status(500).send(err);
if(results) res.status(200).json(results);
});
OR
var model = require('model');
model.find({}, '-__v', function(err,results){
if(err) res.status(500).send(err);
if(results) res.status(200).json(results);
});

Mongoose - update after populate (Cast Exception)

I am not able to update my mongoose schema because of a CastERror, which makes sence, but I dont know how to solve it.
Trip Schema:
var TripSchema = new Schema({
name: String,
_users: [{type: Schema.Types.ObjectId, ref: 'User'}]
});
User Schema:
var UserSchema = new Schema({
name: String,
email: String,
});
in my html page i render a trip with the possibility to add new users to this trip, I retrieve the data by calling the findById method on the Schema:
exports.readById = function (request, result) {
Trip.findById(request.params.tripId).populate('_users').exec(function (error, trip) {
if (error) {
console.log('error getting trips');
} else {
console.log('found single trip: ' + trip);
result.json(trip);
}
})
};
this works find. In my ui i can add new users to the trip, here is the code:
var user = new UserService();
user.email = $scope.newMail;
user.$save(function(response){
trip._users.push(user._id);
trip.$update(function (response) {
console.log('OK - user ' + user.email + ' was linked to trip ' + trip.name);
// call for the updated document in database
this.readOne();
})
};
The Problem is that when I update my Schema the existing users in trip are populated, means stored as objects not id on the trip, the new user is stored as ObjectId in trip.
How can I make sure the populated users go back to ObjectId before I update? otherwise the update will fail with a CastError.
see here for error
I've been searching around for a graceful way to handle this without finding a satisfactory solution, or at least one I feel confident is what the mongoosejs folks had in mind when using populate. Nonetheless, here's the route I took:
First, I tried to separate adding to the list from saving. So in your example, move trip._users.push(user._id); out of the $save function. I put actions like this on the client side of things, since I want the UI to show the changes before I persist them.
Second, when adding the user, I kept working with the populated model -- that is, I don't push(user._id) but instead add the full user: push(user). This keeps the _users list consistent, since the ids of other users have already been replaced with their corresponding objects during population.
So now you should be working with a consistent list of populated users. In the server code, just before calling $update, I replace trip._users with a list of ObjectIds. In other words, "un-populate" _users:
user_ids = []
for (var i in trip._users){
/* it might be a good idea to do more validation here if you like, to make
* sure you don't have any naked userIds in this array already, as you would
*/in your original code.
user_ids.push(trip._users[i]._id);
}
trip._users = user_ids;
trip.$update(....
As I read through your example code again, it looks like the user you are adding to the trip might be a new user? I'm not sure if that's just a relic of your simplification for question purposes, but if not, you'll need to save the user first so mongo can assign an ObjectId before you can save the trip.
I have written an function which accepts an array, and in callback returns with an array of ObjectId. To do it asynchronously in NodeJS, I am using async.js. The function is like:
let converter = function(array, callback) {
let idArray;
async.each(array, function(item, itemCallback) {
idArray.push(item._id);
itemCallback();
}, function(err) {
callback(idArray);
})
};
This works totally fine with me, and I hope should work with you as well

How to put embedded document from one document into another document using Mongoose?

What I'm trying to do should be straight forward but for some reason I'm having real difficulties figuring this out. I have the following Mongoose schemas (simplified).
var Status = new Schema({
name : { type: String, required: true },
description : { type: String }
});
var Category = new Schema({
statuses : [Status], // contains a list of all available statuses
// some other attributes
});
var Book = new Schema({
statuses : [Status], // preferably this would not be an array but a single document, but Mongoose doesn't seem to support that
// some other attributes
});
Now, I want to do the following:
Retrieve the Category document
Find a particular embedded Status document (based on request param)
Assign that particular embedded Status document to a particular Book document. I want to replace the existing Book status as at any given time there should only be one status set for a book.
Here is what I'm currently doing:
mongoose.model('Category').findOne({_id: id}, function(err, category){
if(err) next(err);
var status = category.statuses.id(statusId); // statusId available via closure
book.statuses[0] = status; // book available via closure; trying to replace the existing status here.
book.save(function(err){
if(err) next(err);
next();
});
});
The above seems to run fine and I don't get any errors. However, the new status is not saved to the document. Next time I output the updated Book document, it will still have the old status. I debugged this and the find() methods as well as setting the status seems to be fine.
The only thing I can think of right now is that somehow the status value I'm assigning is not in the right format to be saved with Mongoose. Although, I would expect some kind of error message then.
Or maybe there is a better way to do all of this anyway?
It could be because you are trying to copy an embedded document, which itself could have an ObjectId associated with it. Trying to save the duplicate Status within the Book would create two embedded documents with the same ObjectId. Try creating a new Status object and copying the fields.
It is hard to find docs on ObjectsIds for embedded documents, but they are mentioned here: http://mongoosejs.com/docs/embedded-documents.html.

Iterating through database records in node.js

I'm looking to learn node.js and mongodb which look suitable for something I'd like to make. As a little project to help me learn I thought I'd copy the "posts" table from a phpbb3 forum I have into a mongodb table so I did something like this where db is mongodb database connection, and client is a mysql database connection.
db.collection('posts', function (err, data) {
client.query('select * from phpbb_posts", function(err, rs) {
data.insert(rs);
});
this works ok when I do it on small tables, but my posts table has about 100000 rows in and this query doesn't return even when I leave it running for an hour. I suspect that it's trying to load the entire database table into memory and then insert it.
So what I would like to do is read a chunk of rows at a time and insert them. However I can't see how to read a subset of the rows in node.js, and even more of a problem, I can't understand how I can iterate through the queries one at a time when I only get notification via a callback that it's finished.
Any ideas how I can best do this? (I'm looking for solutions using node.js as I'd like to know how to solve this kind of problem, I could no doubt do it easily some other way)
You could try using the asnyc library by caolan. The library implements some async flow control methods to handle the caveats of a callback-oriented programming style as it is in node.js.
For your case, using the whilst method could work out, using LIMIT queries against mysql and inserting them into mongodb.
Example (not tested, as i have no testdata available, but i think you'll get the idea)
var insertCount = 0;
var offset = 0;
// set this to the overall recordcound from mysql
var recordCount = 0;
async.whilst(
// test condition callback
function () { return insertCount < recordCount; },
// actual worker callback
function (callback) {
db.collection('posts', function (err, data) {
client.query('select * from phpbb_posts LIMIT ' + insertCount + ',1000', function(err, rs) {
data.insert(rs);
// increment by actually fetched recordcount (res.length?)
insertCount += res.length;
// trigger flow callback
callback();
});
});
},
// finished callback
function (err) {
// finished inserting data, maybe check record count in mongodb here
}
});
As i already mentioned, this code is just adapted from an example of the async library readme. But maybe it is an option for adding such amounts of database records from mysql to mongo.