Mongo DB sort behavior - mongodb

Let's say we have a collection of invoices and that we query and sort by sales date.
There can of course be many invoices on the same date.
Does Mongo provide any guarantee consistent order of the invoices for the same date?
e.g. does it also provide a default sort on say _id, or is the behavior described as undefined?
If one were to run the same query multiple times, would the invoices on the same date come in the same order each time?
Or is it up to the developer to also provide a secondary sort property, eg. _id. ?
To me, it looks like it is consistent, but can I really count on that?

1.Does Mongo provide any guarantee consistent order of the invoices for the same date?
Yes, results will be consistent all the time
Does it also provide a default sort on say _id, or is the behavior described as undefined?
By default all records will be sorted by `_id`, that's why i can say Yes to your first question.
If one were to run the same query multiple times, would the invoices on the same date come in the same order each time?
yes, always
is it up to the developer to also provide a secondary sort property, eg. _id. ?
yes
For my experiment results check attached screenshots.

Related

Firestore order by time but sort by ID

I have been trying to figure out a way to query a list of documents where I have a range filter on one field and order by another field which of course isn't possible, see my other question: Order by timestamp with range filter on different field Swift Firestore
But is it possible to save documents with the timestamp as id and then it would sort by default? Or maybe hardcode an ID, then retrieve the last created document id and increase id by one for the next post to be uploaded?
This shows how the documents is ordered in the collection
Any ideas how to store documents so they are ordered by created at in the collection?
It will order by document ID (ascending) by default in Swift.
You can use .order(by: '__id__') but the better/documented way is with FieldPath documentID() I don't really know Swift but I assume that it's something like...
.order(by: FirebaseFirestore.FieldPath.documentID())
JavaScript too has an internal variable which simply returns __id__.
.orderBy(firebase.firestore.FieldPath.documentId())
Interestingly enough __name__ also works, but that sorts the whole path, including the collection name (and also the id of course).
If I correctly understood your need, by doing the following you should get the correct order:
For each document, add a specific field of type number, called for example sortNbr and assign as value a timestamp you calculate (e.g. the epoch time, see Get Unix Epoch Time in Swift)
Then build a query sorted on this field value, like:
let docRef = db.collection("xxxx")
docRef.order(by: "sortNbr")
See the doc here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
Yes, you can do this.
By default, a query retrieves all documents that satisfy the query in
ascending order by document ID.
See the docs here: https://firebase.google.com/docs/firestore/query-data/order-limit-data
So if you find a way to use a timestamp or other primary key value where the ascending lexicographical ordering is what you want, you can filter by any fields and still have the results sorted by the primary key, ascending.
Be careful to zero-pad your numbers to the maximum precision if using a numeric key like seconds since epoch or an integer sequence. 10 is lexicographical less than 2, but 10 is greater than 02.
Using ISO formatted YYYY-mm-ddTHH:MM:SS date-time strings would work, because they sort naturally in ascending order.
The order of the documents shown in the Firebase console is mostly irrelevant to the functioning of your code that uses Firestore. The console is just for browsing data, and that sorting scheme makes it relatively intuitive to find a document you might be looking for, if you know its ID. You can't change this sort order in the console.
Your code is obviously going to have other requirements, and those requirements should be coded into your queries, without regarding any sort order you see in the dashboard. If you want time-based ordering of your documents, you'll have to store some sort of timestamp field in the document, and use that for ordering. I don't recommend using the timestamp as the ID of a document, as that could cause problems for you in the future.

how to join a collection and sort it, while limiting results in MongoDB

lets say I have 2 collections wherein each document may look like this:
Collection 1:
target:
_id,
comments:
[
{ _id,
message,
full_name
},
...
]
Collection 2:
user:
_id,
full_name,
username
I am paging through comments via $slice, let's say I take the first 25 entries.
From these entries I need the according usernames, which I receive from the second collection. What I want is to get the comments sorted by their reference username. The problem is I can't add the username to the comments because they may change often and if so, I would need to update all target documents, where the old username was in.
I can only imagine one way to solve this. Read out the entire full_names and query them in the user collection. The result would be sortable but it is not paged and so it takes a lot of resources to do that with large documents.
Is there anything I am missing with this problem?
Thanks in advance
If comments are an embedded array, you will have to do work on the client side to sort the comments array unless you store it in sorted order. Your application requirements for username force you to either read out all of the usernames of the users who commented to do the sort, or to store the username in the comments and have (much) more difficult and expensive updates.
Sorting and pagination don't work unless you can return the documents in sorted order. You should consider a different schema where comments form a separate collection so that you can return them in sorted order and paginate them. Store the username in each comment to facilitate the sort on the MongoDB side. Depending on your application's usage pattern this might work better for you.
It also seems strange to sort on usernames and expect/allow usernames to change frequently. If you could drop these requirements it'd make your life easier :D

Sort collection by insertion datetime using only id field

I have a collection of data and I want to get it sorted by insertion time. I have not any additional fields to store the insert time. But as I found out I can get this time from Id.
I have tried this code:
return bookmarks.find({}, {sort: {_id.getTimestamp(): 1}, limit: 10});
or
return bookmarks.find({}, {sort: {ObjectId(_id).getTimestamp(): 1}, limit: 10});
but get the error message:
=> Your application has errors. Waiting for file change.
Is there any way to sort collection by insertion datetime using only id field ?
At the moment this isn't possible with Meteor, even if it is with MongoDB. The ObjectID's created with meteor don't bear a timestamp. See http://docs.meteor.com/#collection_object_id
The reason for this is client side code can insert code and it can arrive late on the server, hence there is no guarantee the timestamp portion of the ObjectID will be accurate. In addition to the latency the client side's date is used meaning if they're off it's going to get you incorrect data. I think this is the reason they use an ObjectID but it is completely random.
If you want to sort by date you have to store the time/date separately.
The part what i striked out is not accurate. Meteor use it is own id generation which is based on a random string that is while does not apply the doc what i linked before. Check sasha.sochka's comment under.
It is nearly but not 100% good if you just sort for the _id field . While as it is constructed the first 4 byte is the timestamp in secs (so sorting for the getTimestamps value is not better). Under one second resolution you cannot get the exact order, as it is mentioned in the documentation: http://docs.mongodb.org/manual/reference/object-id/#objectid
It is still true that you can try to check the exact order of the insert/update ops against your collection in the oplog, if you have an oplog, but as it is a capped collection anyway you will see the recent operations only. http://docs.mongodb.org/manual/core/replica-set-oplog/.

mongoDB group by/distinct query

model checkin:
checkin
_id
interest_id
author_id
I've got a collection of checkins (resolved by simple "find" query)
I'd like to count the number of checkins for each interest.
What makes the task a bit more difficult - we should count two checkins from one person and one interest as one checkin.
AFAIK, group operations in mongo are performed by map/reduce query. Should we use it here? The only idea I've got with such an approach is to aggregate the array of users for each interest and then return this array's length.
EDIT I ended up with not using map/reduce at all, allthough Emily's answer worked fine & quick.
I have to select only checkins from last 60 minutes, and there shouldn't be too many results. So I just get all of them to Ruby driver, and do all the calculation's on ruby side. It's a bit slower, but much more scalable and easy-to-understand.
best,
Roman
Map reduce would probably be the way to go for this and you could get the desired results with two map reduces.
In the first, you could remove duplicate author_id and interest_id pairs.
key would be author_id and interest_id
values would be checkin_id
The second map reduce would just be a count of the number of checkins by a given author_id.
key would be author_id
value would be checkin_id count

Relative staleness of a record

I have votes in my system, but I want to sort by some calculation of votes and freshness of the item.
I know that in SQL, it should be quite easy to do something like this - with GETDATE(), DATEDIFF() and a bit of math magic.
How would I go about doing this in Mongo? Will it require to use map-reduce? I remember hearing that map-reduce is blocking - so it's not recommended for high-traffic website page queries?
Thank you
In MongoDB you can sort on the value of any of the fields in a document, but you can't sort on a calculated value.
However, in this case, couldn't you just sort on the date (descending) instead of the freshness? Wouldn't that give the same sort order?