I have a script that populates a mongo db from daily server log files. Log files come from a number of servers so the chronological order of the data is not guaranteed. To make this simple, let's say that the document schema is this:
{
_id: <username>,
first_seen: <date>,
last_seen: <date>,
most_recent_ip: <string>
}
that is, documents are indexed by the name of the user who accessed the server. For each user, we keep track of the first time the user was seen and the ip from the last visit.
Right now I handle this very inefficiently: first try an insert. If it fails, retrieve a record by _id, then calculate updated values (e.g. first_seen and most_recent_up), and finally update the record. This is 3 db calls per log entry, which makes the script's running time prohibitively long given the very high volume of data.
I'm wondering if I can replace this with an upsert instead. I can see how to handle first/last_seen: probably something like {$min: {'first_seen': <log_entry_date>}} (hope this works correctly when inserting a new doc). But how do I set most_recent_ip to the new value only when <log_entry_date> > $last_seen.
Is there generally a preferred pattern for my use case?
You can just use $set to set the most_recent_ip, e.g.
db.logs.update(
{_id:"user1"},
{$set:{most_recent_ip:"2.2.2.2"}, $min:{first_seen:new Date()}, $max:{last_seen:new Date()}},
{upsert: true}
)
Related
Is there a simple OR elegant method (or query that I can write) to retrieve the last updated timestamp (of the last updated document) in a collection. I can write a query like this to find the last inserted document
db.collection.find().limit(1).sort({$natural:-1})
but I need information about the last updated document (it could be an insert or an update).
I know that one way is to query the oplog collection for the last record from a collection. But it seems like an expensive operation given the fact that oplog could be of very large size (also not trustworthy as it is a capped collection). Is there a better way to do this?
Thanks!
You could get the last insert time same way you mentioned in the question:
db.collection.find().sort({'_id': -1}).limit(1)
But, There isn't any good way to see the last update/delete time. But, If you are using replica sets you could get that from the oplog.
Or, you could add new field in document as 'lastModified'.
You can also checkout collection-hooks. I hope this will help
One way to go about it is to have a field that holds the time of last update. You can name it updatedAt. Every time you make an update to the document, you'll just update the value to the current time. If you use the ISO format to store the time, you'll be able to sort without issues (that's what I use).
The other way is the _id field.
Method 1
db.collection.find().limit(1).sort({updatedAt: -1})
Method 2
db.collection.find().limit(1).sort({_id: -1})
You can try with ,
db.collection.findOne().sort({$natural:-1}).limit(1);
I have some unused collections in the MongoDb database. I've to find out when the CRUD operations done against collections in the database. We have our own _id field instead of mongo's default object_id. We dont have any time filed in the collections to find out the modification time. is there any way to find out the modification time of collections in mongodb from meta data? Is there any data dictionay informations like in oracle to find out this? please give some idea/workarounds
To make a long story short: MongoDB has a flexible schema. Simply add a date field. Since older entries don't have it, they can not be the last entry.
Let's call that field mtime.
So after adding a date field to your schema definition, we generate an index in descending order on the new field:
db.yourCollction.createIndex({mtime:-1})
Finding the last mtime for a collection now is easy:
db.yourCollection.find({"mtime":{"$exists":true}}).sort({"mtime":-1}).limit(1)
Do this for every collection. When the above query does not return a value within the timeframe you defined for purging a collection, simply drop it, since it has not been modified since you introduced the mtime field.
After your collections are cleaned up, you may remove the mtime field from your schema definition. To remove it from the documents, you can run a simple query:
db.yourCollection.update(
{ "mtime":{ $exists:true} },
{ "$unset":{ "mtime":""} },
{ multi: true}
)
There is no "data dictionary" to get this information in MongoDB.
If you've enabled the profiling level in advance to log all operations (db.setProfilingLevel(2)) and you haven't had many operations to log, so that the system.profile capped collection hasn't overwritten whatever logs you are interested in, you can get the information you need there—but otherwise it's gone.
I have a collection of users with the following schema:
{
_id:ObjectId("123...."),
name:"user_name",
field1:"field1 value",
field2:"field2 value",
etc...
}
The users are looked up by the user.name, which must be unique. When a new user is added, I first perform a search and if no such user is found, I add the new user document to the collection. The operations of searching for the user and adding a new user, if not found, are not atomic, so it's possible, when multiple application servers are connect to the DB server, for two add_user requests to be received at the same time with the same user name, resulting in no such user being found for both add_user requests, which in turn results with two documents having the same "user.name". In fact this happened (due to a bug on the client) with just a single app server running NodeJS and using Async library.
I was thinking of using findAndModify, but that doesn't work, since I'm not simply updating a field (that exists or doesn't exist) of a document that already exists and can use upsert, but want to insert a new document only if the search criteria fails. I can't make the query to be not equal to "user.name", since it will find other users.
First of all, you should maintain a unique index on the name field of the users collection. This can be specified in the schema if you are using Mongoose or by using the statement:
collection.ensureIndex('name', {unique: true}, callback);
This will make sure that the name field remains unique and will solve the problem of concurrent requests as you have specified in your question. You do not require searching when this index is set.
I am new to Sails and Mongo Db. Currently I am trying to implement a CRUD Function using Sails where I want to save user details in Mongo db.In the model I have the following attributes
"id":{
type:'Integer',
min:100,
autoincrement:true
},
attributes: {
name:{
type:'String',
required:true,
unique:true
},
email_id:{
type:'EMAIL',
required:false,
unique:false
},
age:{
type:'Integer',
required:false,
unique:false
}
}
I want to ensure that the _id is overridden with my values starting from 100 and is auto incremented with each new entry. I am using the waterline model and when I call the Api in DHC, I get the following output
"name": "abc"
"age": 30
"email_id": "abc#gmail.com"
"id": "5587bb76ce83508409db1e57"
Here the Id given is the object Id.Can somebody tell me how to override the object id with an Integer starting from 100 and is auto incremented with every new value.
Attention: Mongo id should be unique as possible in order to scale well. The default ObjectId is consist of a timestamp, machine ID, process ID and a random incrementing value. Leaving it with only the latter would make it collision prone.
However, sometimes you badly want to prettify the never-ending ObjectID value (i.e. to be shown in the URL after encoding). Then, you should consider using an appropriate atomic increment strategy.
Overriding the _id example:
db.testSOF.insert({_id:"myUniqueValue", a:1, b:1})
Making an Auto-Incrementing Sequence:
Use Counters Collection: Basically a separated collection which keeps track the last number of the sequence. Personally, I have found it more cohesive to store the findAndModify function in the system.js collection, although it lacks version control's capabilities.
Optimistic Loop
Edit:
I've found an issue in which the owner of sails-mongo said:
MongoDb doesn't have an auto incrementing attribute because it doesn't
support it without doing some kind of manual sequence increment on a
separate collection or document. We don't currently do this in the
adapter but it could be added in the future or if someone wants to
submit a PR. We do something similar for sails-disk and sails-redis to
get support for autoIncremeting fields.
He mentions the first technique I added in this answer:
Use Counters Collection. In the same issue, lewins shows a workaround.
I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.