I have votes in my system, but I want to sort by some calculation of votes and freshness of the item.
I know that in SQL, it should be quite easy to do something like this - with GETDATE(), DATEDIFF() and a bit of math magic.
How would I go about doing this in Mongo? Will it require to use map-reduce? I remember hearing that map-reduce is blocking - so it's not recommended for high-traffic website page queries?
Thank you
In MongoDB you can sort on the value of any of the fields in a document, but you can't sort on a calculated value.
However, in this case, couldn't you just sort on the date (descending) instead of the freshness? Wouldn't that give the same sort order?
Related
I'm working on my app and I just ran into a dilemma regarding what's the best way to handle indexes for firestore.
I have a query that search for publication in a specify community that contains at least one of the tag and in a geohash range. The index for that query looks like this:
community Ascending tag Ascending location.geohash Ascending
Now if my user doesnt need to filter by tag, I run the query without the arrayContains(tag) which prompt me to create another index:
community Ascending location.geohash Ascending
My question is, is it better to create that second index or, to just use the first one and specifying all possible tags in arrayContains in the query if the user want no filters on tag ?
Neither is pertinently better, but it's a typical space vs time tradeoff.
Adding the extra tags in the query adds some overhead there, but it saves you the (storage) cost for the additional index. So you're trading some small amount of runtime performance for a small amount of space/cost savings.
One thing to check is whether the query with tags can actually run on just the second index, as Firestore may be able to do a zigzag merge join. In that case you could only keep the second, smaller index and save the runtime performance of adding additional clauses, but then get a (similarly small) performance difference on the query where you do specify one or more tags.
Let's say we have a collection of invoices and that we query and sort by sales date.
There can of course be many invoices on the same date.
Does Mongo provide any guarantee consistent order of the invoices for the same date?
e.g. does it also provide a default sort on say _id, or is the behavior described as undefined?
If one were to run the same query multiple times, would the invoices on the same date come in the same order each time?
Or is it up to the developer to also provide a secondary sort property, eg. _id. ?
To me, it looks like it is consistent, but can I really count on that?
1.Does Mongo provide any guarantee consistent order of the invoices for the same date?
Yes, results will be consistent all the time
Does it also provide a default sort on say _id, or is the behavior described as undefined?
By default all records will be sorted by `_id`, that's why i can say Yes to your first question.
If one were to run the same query multiple times, would the invoices on the same date come in the same order each time?
yes, always
is it up to the developer to also provide a secondary sort property, eg. _id. ?
yes
For my experiment results check attached screenshots.
I have a headache for a idea how to properly sort data from a MongoDB. It is using 2dsphere index and has timestamp createdAt. The goal is to show latest pictures (that what this collection is about, just a field mediaUrl...) but it has to be close to the user. I'm not very familiar with complex MongoDB aggregation queries so I thought here's a good place to ask. Sorting with $near shows only items sorted by distance. But there's a upload time, e.g. if item is 5 min fresh but is like 500 meters far than older item it still should be sorted higher.
Ugly way would be to iterate every few hundreds meters and collect data but maybe there's a smarter way?
So if I am correct you want to be able to sort on 2 fields ?
distance
timestamp
You should check out this method:
https://docs.mongodb.com/manual/reference/operator/aggregation/sort/
It allows you to sort multiple columns.
I would like to store and query documents that contain a from-to date range, where the range represents an interval when the document has been valid.
Typical use cases in lucene/solr documentation address the opposite problem: Querying for documents that contain a single timestamp and this timestamp is contained in a date range provided as query parameter. (createdate:[1976-03-06T23:59:59.999Z TO *])
I want to use the edismax parser.
I have found the ms() function, which seems to me to be designed for boosting score only, not to eliminate non-matching results entirely.
I have found the article Spatial Search Tricks for People Who Don't Have Spatial Data, where the problem described by me is said to be Easy... (Find People Alive On May 25, 1977).
Is there any simpler way to express something like
date_from_query:[valid_from_field TO valid_to_field] than using the spacial approach?
The most direct approach is to create the bounds yourself:
valid_from_field:[* TO date_from_query] AND valid_to_field:[date_from_query TO *]
.. which would give you documents where the valid_from_field is earlier than the date you're querying, and the valid_to_field is later than the date you're querying, in effect, extracting the interval contained between valid_from_field and valid_to_field. This assumes that neither field is multi valued.
I'd probably add it as a filter query, since you don't need any scoring from it, and you probably want to allow other search queries at the same time.
model checkin:
checkin
_id
interest_id
author_id
I've got a collection of checkins (resolved by simple "find" query)
I'd like to count the number of checkins for each interest.
What makes the task a bit more difficult - we should count two checkins from one person and one interest as one checkin.
AFAIK, group operations in mongo are performed by map/reduce query. Should we use it here? The only idea I've got with such an approach is to aggregate the array of users for each interest and then return this array's length.
EDIT I ended up with not using map/reduce at all, allthough Emily's answer worked fine & quick.
I have to select only checkins from last 60 minutes, and there shouldn't be too many results. So I just get all of them to Ruby driver, and do all the calculation's on ruby side. It's a bit slower, but much more scalable and easy-to-understand.
best,
Roman
Map reduce would probably be the way to go for this and you could get the desired results with two map reduces.
In the first, you could remove duplicate author_id and interest_id pairs.
key would be author_id and interest_id
values would be checkin_id
The second map reduce would just be a count of the number of checkins by a given author_id.
key would be author_id
value would be checkin_id count