Why do we when the created_at field when the timestamp can be found in the first 4 bytes of the ObjectId
ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()
Taken from MongoDB Docs
There are several cases where it makes sense to do so:
When you need better precision - ObjectId.getTimestamp() is precise up to seconds, while Date fields store milliseconds. Compare this in mongo shell: new Date() yields ISODate("2016-01-03T21:21:38.032Z"), while ObjectId().getTimestamp() yields ISODate("2016-01-03T21:21:50Z").
When you are not using ObjectId at all - it is often taken as a given that _id field should be populated with ObjectId, while in fact ObjectId is only a default used by most of the drivers and MongoDB itself doesn't impose it - on the contrary, it is encouraged to use any "natural" unique ID if it exists for the documents. In this case though you will have to store "creation timestamp" yourself if you need it.
Usability - if you rely on the presence of this field and the data in it, it might be better, at least from design standpoint, to be explicit about it. This is more a matter of taste though. However, as noted in comments, if you also want to filter or sort by "creation timestamp" - it will be easier to do having a dedicated field for it and using query operators like $gt, for example, directly on it.
As you said, like it states clearly in the documentation:
Since the _id ObjectId by default stores the 4 byte timestamp, in most cases you do not need to store the creation time of any document.
And you may use ObjectId("5349b4ddd2781d08c09890f4").getTimestamp() in order to get the creation date in ISO date format.
It is also a matter of convenience to the costumer (us) to have a service like that, as it makes our attempt of getting the creating date and performing actions on it much more intuitive and easy.
Related
I have been trying to figure out a way to query a list of documents where I have a range filter on one field and order by another field which of course isn't possible, see my other question: Order by timestamp with range filter on different field Swift Firestore
But is it possible to save documents with the timestamp as id and then it would sort by default? Or maybe hardcode an ID, then retrieve the last created document id and increase id by one for the next post to be uploaded?
This shows how the documents is ordered in the collection
Any ideas how to store documents so they are ordered by created at in the collection?
It will order by document ID (ascending) by default in Swift.
You can use .order(by: '__id__') but the better/documented way is with FieldPath documentID() I don't really know Swift but I assume that it's something like...
.order(by: FirebaseFirestore.FieldPath.documentID())
JavaScript too has an internal variable which simply returns __id__.
.orderBy(firebase.firestore.FieldPath.documentId())
Interestingly enough __name__ also works, but that sorts the whole path, including the collection name (and also the id of course).
If I correctly understood your need, by doing the following you should get the correct order:
For each document, add a specific field of type number, called for example sortNbr and assign as value a timestamp you calculate (e.g. the epoch time, see Get Unix Epoch Time in Swift)
Then build a query sorted on this field value, like:
let docRef = db.collection("xxxx")
docRef.order(by: "sortNbr")
See the doc here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
Yes, you can do this.
By default, a query retrieves all documents that satisfy the query in
ascending order by document ID.
See the docs here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
So if you find a way to use a timestamp or other primary key value where the ascending lexicographical ordering is what you want, you can filter by any fields and still have the results sorted by the primary key, ascending.
Be careful to zero-pad your numbers to the maximum precision if using a numeric key like seconds since epoch or an integer sequence. 10 is lexicographical less than 2, but 10 is greater than 02.
Using ISO formatted YYYY-mm-ddTHH:MM:SS date-time strings would work, because they sort naturally in ascending order.
The order of the documents shown in the Firebase console is mostly irrelevant to the functioning of your code that uses Firestore. The console is just for browsing data, and that sorting scheme makes it relatively intuitive to find a document you might be looking for, if you know its ID. You can't change this sort order in the console.
Your code is obviously going to have other requirements, and those requirements should be coded into your queries, without regarding any sort order you see in the dashboard. If you want time-based ordering of your documents, you'll have to store some sort of timestamp field in the document, and use that for ordering. I don't recommend using the timestamp as the ID of a document, as that could cause problems for you in the future.
I would like to store and query documents that contain a from-to date range, where the range represents an interval when the document has been valid.
Typical use cases in lucene/solr documentation address the opposite problem: Querying for documents that contain a single timestamp and this timestamp is contained in a date range provided as query parameter. (createdate:[1976-03-06T23:59:59.999Z TO *])
I want to use the edismax parser.
I have found the ms() function, which seems to me to be designed for boosting score only, not to eliminate non-matching results entirely.
I have found the article Spatial Search Tricks for People Who Don't Have Spatial Data, where the problem described by me is said to be Easy... (Find People Alive On May 25, 1977).
Is there any simpler way to express something like
date_from_query:[valid_from_field TO valid_to_field] than using the spacial approach?
The most direct approach is to create the bounds yourself:
valid_from_field:[* TO date_from_query] AND valid_to_field:[date_from_query TO *]
.. which would give you documents where the valid_from_field is earlier than the date you're querying, and the valid_to_field is later than the date you're querying, in effect, extracting the interval contained between valid_from_field and valid_to_field. This assumes that neither field is multi valued.
I'd probably add it as a filter query, since you don't need any scoring from it, and you probably want to allow other search queries at the same time.
I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.
I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.
Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.
I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.
Almost all my documents include 2 fields, start timestamp and final timestamp. And in each query, I need to retrieve elements which are in selected period of time. so start should be after selected value and final should be before selected timestamp.
query looks like
db.collection.find({start:{$gt:DateTime(...)}, final:{$lt:DateTime(...)}})
So what is the best indexing strategy for that scenario?
By the way, which is better for performance - to store date as datetimes or as unix timestamps, which is long value itself
To add a little more to baloo's answer.
On the time-stamp vs. long issue. Generally the MongoDB server will not see a difference. The BSON encoding length is the same (64 bits). You may see a performance different on the client side depending on the driver's encoding. As an example, on the Java side a using the 10gen driver a time-stamp is rendered as Date that is a lot heavier than Long. There are drivers that try to avoid that overhead.
The other issue is that you will see a performance improvement if you close the range for the first field of the index. So if you use the index suggested by baloo:
db.collection.ensureIndex({start: 1, final: 1})
The query will perform (potentially much) better if it is:
db.collection.find({start:{$gt:DateTime(...),$lt:DateTime(...)},
final:{$lt:DateTime(...)}})
Conceptually, if you think of the indexes as a a tree the closed range limits both sides of the tree instead of just one side. Without the closed range the server has to "check" all of the entries with a start greater than the time stamp provided since it does not know of the relation between start and final.
You may even find that that the query performance is no better using a single field index like:
db.collection.ensureIndex({start: 1})
Most of the savings is from the first field's pruning. The case where this will not be the case is when the query is covered by the index or the ordering/sort for the results can be derived from the index.
You can use a Compound index in order to create an index for multiple fields.
db.collection.ensureIndex({start: 1, final: 1})
Compare different queries and indexes by using explain() to get the most out of your database