Sort collection by insertion datetime using only id field

Sort collection by insertion datetime using only id field - mongodb

I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?

At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.

The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.

Related

Firestore order by time but sort by ID

I have been trying to figure out a way to query a list of documents where I have a range filter on one field and order by another field which of course isn't possible, see my other question: Order by timestamp with range filter on different field Swift Firestore
But is it possible to save documents with the timestamp as id and then it would sort by default? Or maybe hardcode an ID, then retrieve the last created document id and increase id by one for the next post to be uploaded?
This shows how the documents is ordered in the collection
Any ideas how to store documents so they are ordered by created at in the collection?

It will order by document ID (ascending) by default in Swift.
You can use .order(by: '__id__') but the better/documented way is with FieldPath documentID() I don't really know Swift but I assume that it's something like...
.order(by: FirebaseFirestore.FieldPath.documentID())
JavaScript too has an internal variable which simply returns __id__.
.orderBy(firebase.firestore.FieldPath.documentId())
Interestingly enough __name__ also works, but that sorts the whole path, including the collection name (and also the id of course).

If I correctly understood your need, by doing the following you should get the correct order:
For each document, add a specific field of type number, called for example sortNbr and assign as value a timestamp you calculate (e.g. the epoch time, see Get Unix Epoch Time in Swift)
Then build a query sorted on this field value, like:
let docRef = db.collection("xxxx")
docRef.order(by: "sortNbr")
See the doc here: https://firebase.google.com/docs/firestore/query-data/order-limit-data

Yes, you can do this.
By default, a query retrieves all documents that satisfy the query in
ascending order by document ID.
See the docs here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
So if you find a way to use a timestamp or other primary key value where the ascending lexicographical ordering is what you want, you can filter by any fields and still have the results sorted by the primary key, ascending.
Be careful to zero-pad your numbers to the maximum precision if using a numeric key like seconds since epoch or an integer sequence. 10 is lexicographical less than 2, but 10 is greater than 02.
Using ISO formatted YYYY-mm-ddTHH:MM:SS date-time strings would work, because they sort naturally in ascending order.

The order of the documents shown in the Firebase console is mostly irrelevant to the functioning of your code that uses Firestore. The console is just for browsing data, and that sorting scheme makes it relatively intuitive to find a document you might be looking for, if you know its ID. You can't change this sort order in the console.
Your code is obviously going to have other requirements, and those requirements should be coded into your queries, without regarding any sort order you see in the dashboard. If you want time-based ordering of your documents, you'll have to store some sort of timestamp field in the document, and use that for ordering. I don't recommend using the timestamp as the ID of a document, as that could cause problems for you in the future.

MongoDB: How to get the last updated timestamp of the last updated document in a collection

Is there a simple OR elegant method (or query that I can write) to retrieve the last updated timestamp (of the last updated document) in a collection. I can write a query like this to find the last inserted document
db.collection.find().limit(1).sort({$natural:-1})
but I need information about the last updated document (it could be an insert or an update).
I know that one way is to query the oplog collection for the last record from a collection. But it seems like an expensive operation given the fact that oplog could be of very large size (also not trustworthy as it is a capped collection). Is there a better way to do this?
Thanks!

You could get the last insert time same way you mentioned in the question:
db.collection.find().sort({'_id': -1}).limit(1)
But, There isn't any good way to see the last update/delete time. But, If you are using replica sets you could get that from the oplog.
Or, you could add new field in document as 'lastModified'.
You can also checkout collection-hooks. I hope this will help

One way to go about it is to have a field that holds the time of last update. You can name it updatedAt. Every time you make an update to the document, you'll just update the value to the current time. If you use the ISO format to store the time, you'll be able to sort without issues (that's what I use).
The other way is the _id field.
Method 1
db.collection.find().limit(1).sort({updatedAt: -1})
Method 2
db.collection.find().limit(1).sort({_id: -1})

You can try with ,
db.collection.findOne().sort({$natural:-1}).limit(1);

How to find last update/insert/delete operation time on mongodb collection without objectid field

I have some unused collections in the MongoDb database. I've to find out when the CRUD operations done against collections in the database. We have our own _id field instead of mongo's default object_id. We dont have any time filed in the collections to find out the modification time. is there any way to find out the modification time of collections in mongodb from meta data? Is there any data dictionay informations like in oracle to find out this? please give some idea/workarounds

To make a long story short: MongoDB has a flexible schema. Simply add a date field. Since older entries don't have it, they can not be the last entry.
Let's call that field mtime.
So after adding a date field to your schema definition, we generate an index in descending order on the new field:
db.yourCollction.createIndex({mtime:-1})
Finding the last mtime for a collection now is easy:
db.yourCollection.find({"mtime":{"$exists":true}}).sort({"mtime":-1}).limit(1)
Do this for every collection. When the above query does not return a value within the timeframe you defined for purging a collection, simply drop it, since it has not been modified since you introduced the mtime field.
After your collections are cleaned up, you may remove the mtime field from your schema definition. To remove it from the documents, you can run a simple query:
db.yourCollection.update(
{ "mtime":{ $exists:true} },
{ "$unset":{ "mtime":""} },
{ multi: true}
)

There is no "data dictionary" to get this information in MongoDB.
If you've enabled the profiling level in advance to log all operations (db.setProfilingLevel(2)) and you haven't had many operations to log, so that the system.profile capped collection hasn't overwritten whatever logs you are interested in, you can get the information you need there—but otherwise it's gone.

Should I use the timestamp in "_id"?

I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.

I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.

Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.

mongodb index strategy for range query with different fields

Almost all my documents include 2 fields, start timestamp and final timestamp. And in each query, I need to retrieve elements which are in selected period of time. so start should be after selected value and final should be before selected timestamp.
query looks like
db.collection.find({start:{$gt:DateTime(...)}, final:{$lt:DateTime(...)}})
So what is the best indexing strategy for that scenario?
By the way, which is better for performance - to store date as datetimes or as unix timestamps, which is long value itself

To add a little more to baloo's answer.
On the time-stamp vs. long issue. Generally the MongoDB server will not see a difference. The BSON encoding length is the same (64 bits). You may see a performance different on the client side depending on the driver's encoding. As an example, on the Java side a using the 10gen driver a time-stamp is rendered as Date that is a lot heavier than Long. There are drivers that try to avoid that overhead.
The other issue is that you will see a performance improvement if you close the range for the first field of the index. So if you use the index suggested by baloo:
db.collection.ensureIndex({start: 1, final: 1})
The query will perform (potentially much) better if it is:
db.collection.find({start:{$gt:DateTime(...),$lt:DateTime(...)},
final:{$lt:DateTime(...)}})
Conceptually, if you think of the indexes as a a tree the closed range limits both sides of the tree instead of just one side. Without the closed range the server has to "check" all of the entries with a start greater than the time stamp provided since it does not know of the relation between start and final.
You may even find that that the query performance is no better using a single field index like:
db.collection.ensureIndex({start: 1})
Most of the savings is from the first field's pruning. The case where this will not be the case is when the query is covered by the index or the ordering/sort for the results can be derived from the index.

You can use a Compound index in order to create an index for multiple fields.
db.collection.ensureIndex({start: 1, final: 1})
Compare different queries and indexes by using explain() to get the most out of your database