mongo query based on field calculation - mongodb

I am looking for a way to query mongo for documents matching the results between two fields when compared to a variable.
For example, overlapping date ranges. I have a document with the following schema:
{startDate : someDate, endDate : otherDate, restrictions : {daysBefore : 5, daysAfter : 5}}
My user will supply their own date range like
var userInfo = { from : Date, to : Date}
I need the documents that satisfy this condition:
startDate - restrictions.daysBefore <= userInfo.to && endDate + restrictions.daysAfter >= userInfo.from;
I tried using a $where clause, but I loose the context of the to and from since they are defined outside of the scope of the where function.
I would like to do this without pulling down all of the results, or creating another field upon insert.
Is there a simple way this query can be done?

The aggregation framework [AF] will do what you want. The AF backend is written in C++ and therefor much faster then using JavaScript as an added bonus. In addition to faster then JavaScript there are number of reasons we discourage the use of $where some of which can be found in the $where docs.
The AF docs(i.e. the good stuff to use):
http://docs.mongodb.org/manual/reference/aggregation/
I am uncertain the format of the data you are storing, and this will also have an affect on performance. For instance if the date is the standard date of milliseconds since Jan 1st 1970 (unix epoch) and daysBefore is stored in (miliseconds per day) * (number of days), you can use simple math as the example below does. This is very fast. If not there are date conversions available in the AF, but that is of course more expensive to do the conversions in addition to getting the differences.
In Python (your profile mentions Django) datetime.timedelta can be used be used for daysBefore. For instance for 5 days:
import datetime
daysBefore=datetime.timedelta(5)
There are two main ways to go about what you want to use in the AF. Do the calculation directly and match on it, or create a new column and match against that. Your specific use case and testing against will be necessary for complicated or large scale deployments. An aggregate command from the shell to match against the calculation in Python:
fromDate=<program provided>
db.collection.aggregate([{"$match":{"startDate":{ "$lt": {"$add": ["$restrictions.daysBefore", fromDate]}}}}])
If you want to run multiple calculations in the same $match use $and:[{}, {}, …, {}]. I omitted that for clarity.
Further aggregation documentation for the AF can be found at:
http://api.mongodb.org/python/current/examples/aggregation.html#aggregation-framework
Note that “aggregation” also includes map reduce in Mongo, but this case the AF should be able to do it all (and much more quickly).
If you need any further information about the AF or if there is anything the docs don’t make clear, please don’t hesitate to ask.
Best,
Charlie

Related

MongoDB collection field compare with Current date to get the active users count in a day

var dt= new Date();
var datestring = dt.getFullYear()+"-"+("0"+(dt.getMonth()+1)).slice(-2)+"-"+("0"+dt.getDate()).slice(-2);
db.getCollection('Profile Section').count({Last_Active : {$regex : datestring}})
I have written this query in Mongo DB, It is giving correct count value. So I have written Query properly. I am using Spring Boot for backend, how can I create this query equivalent to JAVA, I am using rest API to call in postman application,
Can u suggest how can I write equivalent spring boot backend with the help of this query? and also I have to insert resultant count value to another collection called ActiveUsers in mongodb.
#GetMapping("/Activeusers")
is the api call in my JAVA application.
You have to use $gte & $lt query commands.
db.getCollection('Profile').find({
"lastActive" : {
$gte: new Date('2020-05-19'),
$lt: new Date('2020-05-20'),
}
}).pretty()
In Spring,
#Query("{'lastActive' : { $gte: ?0, $lt: ?1 } }")
public List<Profile> getProfileByLastActive(LocalDate today, LocalDate nextDay);
Use either LocalDate or Date as per your convenience.
If you are going to implement in Node.js, the best option is to use moment.js. Actually I have been working with moment.js for calendar related activities. That's y I am suggesting to use it.
Your code is accurate. The problem: it's too much accurate.
If you were to update those records every millisecond, you would be able to get them. But this is an overkill for most usages, and this is where the architecture of your system matters.
Querying a date field with an $eq operator, will fetch results with an exact match, with a milliseconds precision. The "active users" logic might attempt us to check users who are active right now, and imply our intuition to use an $eq operator. While this is correct, we will miss lots of users who are active, but their corresponding records on db are not updated at the millisecond rate (and this depends on the way you update your db records).
As implied above, one solution would be to update the db, with dozens of updates just to have an accurate description on db for a kind of real time active users manner. This might be too much for many systems.
Another solution would be to query the active users by an interval / gap of few seconds (or more). Every second would increase the probability of getting active users by a factor of 1,000. You can see this here:
db.getCollection('Profile').find({"lastActive" : {$gte: new Date(ISODate().getTime() - 1000 * 60 * 2)}}).pretty() // this fetch records within 2 seconds

Why do we need created_at in MongoDB

Why do we when the created_at field when the timestamp can be found in the first 4 bytes of the ObjectId
ObjectId("5349b4ddd2781d08c09890f4").getTimestamp()
Taken from MongoDB Docs
There are several cases where it makes sense to do so:
When you need better precision - ObjectId.getTimestamp() is precise up to seconds, while Date fields store milliseconds. Compare this in mongo shell: new Date() yields ISODate("2016-01-03T21:21:38.032Z"), while ObjectId().getTimestamp() yields ISODate("2016-01-03T21:21:50Z").
When you are not using ObjectId at all - it is often taken as a given that _id field should be populated with ObjectId, while in fact ObjectId is only a default used by most of the drivers and MongoDB itself doesn't impose it - on the contrary, it is encouraged to use any "natural" unique ID if it exists for the documents. In this case though you will have to store "creation timestamp" yourself if you need it.
Usability - if you rely on the presence of this field and the data in it, it might be better, at least from design standpoint, to be explicit about it. This is more a matter of taste though. However, as noted in comments, if you also want to filter or sort by "creation timestamp" - it will be easier to do having a dedicated field for it and using query operators like $gt, for example, directly on it.
As you said, like it states clearly in the documentation:
Since the _id ObjectId by default stores the 4 byte timestamp, in most cases you do not need to store the creation time of any document.
And you may use ObjectId("5349b4ddd2781d08c09890f4").getTimestamp() in order to get the creation date in ISO date format.
It is also a matter of convenience to the costumer (us) to have a service like that, as it makes our attempt of getting the creating date and performing actions on it much more intuitive and easy.

Solr: Query for documents whose from-to date range contains the user input

I would like to store and query documents that contain a from-to date range, where the range represents an interval when the document has been valid.
Typical use cases in lucene/solr documentation address the opposite problem: Querying for documents that contain a single timestamp and this timestamp is contained in a date range provided as query parameter. (createdate:[1976-03-06T23:59:59.999Z TO *])
I want to use the edismax parser.
I have found the ms() function, which seems to me to be designed for boosting score only, not to eliminate non-matching results entirely.
I have found the article Spatial Search Tricks for People Who Don't Have Spatial Data, where the problem described by me is said to be Easy... (Find People Alive On May 25, 1977).
Is there any simpler way to express something like
date_from_query:[valid_from_field TO valid_to_field] than using the spacial approach?
The most direct approach is to create the bounds yourself:
valid_from_field:[* TO date_from_query] AND valid_to_field:[date_from_query TO *]
.. which would give you documents where the valid_from_field is earlier than the date you're querying, and the valid_to_field is later than the date you're querying, in effect, extracting the interval contained between valid_from_field and valid_to_field. This assumes that neither field is multi valued.
I'd probably add it as a filter query, since you don't need any scoring from it, and you probably want to allow other search queries at the same time.

Should I use the timestamp in "_id"?

I need monitor the time of the records been created, for further query and modify.
first thing flashed in my mind is give the document a "createDateTime" field, with the default value of "new Date()", but Mongodb said the document _id has a timestamp embedded with, and the id was generated when the document was created, so it sounds dummy to add a new field for that.
for too many times, I've seen people set a "createDateTime" for their data, and I don't know if they know about the details of mongodb's _id.
I want know should I use the _id as a "createDateTime" field? what is the best practice?
and the pros and cons.
thanks for any tips.
I'd actually say it depends on how you want to use the date.
For example, it's not actionable using the aggregation framework Date operators.
This will fail for example:
db.test.aggregate( { $group : { _id: { $year: "$_id" } } })
The following error occurs:
"errmsg" : "exception: can't convert from BSON type OID to Date"
(The date cannot be extracted from the ObjectId.)
So, operations that are normally simple date operations become much more complex if you wanted to do any sort of date math in an aggregation. It would be far easier to have a createDateTime stamp. Counting the number of documents created in a particular year and month would be simple using aggregation with a distinct createdDateTime field.
You can sort on an ObjectId, to some degree. The remaining 8 bytes of the ObjectId aren't sortable in a meaningful way. Most MongoDB drivers default to creating the ObjectId within the driver and not on the database. So, if you've got multiple clients (like web servers for example) creating new documents (and new ObjectIds), the time stamps will only be as accurate as the various servers.
Also, depending the precision you'd need, an ISODate value is stored using 8 bytes, rather than the 4 used in an ObjectId.
Yes, you should. There is no reason not to do, besides the human readability while directly looking into the database. See also here and here.
If you want to use the aggregation framework to group by the date within _id, this is not possible yet as WiredPrairie correctly said. There is an open jira ticket for that, you might watch. But of course you can do this with Map-Reduce and ObjectID.getTimestamp(). An example for that can be found here.

MongoDB - most efficient way of getting the last version of a document

I'm using MongoDB to hold a collection of documents.
Each document has an _id (version) which is an ObjectId. Each document has a documentId which is shared across the different versions. This too is an OjectId assigned when the first document was created.
What's the most efficient way of finding the most up-to-date version of a document given the documentId?
I.e. I want to get the record where _id = max(_id) and documentId = x
Do I need to use MapReduce?
Thanks in advance,
Sam
Add index containing both fields (documentId, _id) and don't use max (what for)? Use query with documentId = x, order DESC by _id and limit(1) results to get the latest. Remember about proper sorting order of index (DESC also)
Something like that
db.collection.find({documentId : "x"}).sort({_id : -1}).limit(1)
Other approach (more denormalized) would be to use other collecion with documents like:
{
documentId : "x",
latestVersionId : ...
}
Use of atomic operations would allow to safely update this collection. Adding proper index would make queries fast as lightning.
There is one thing to take into account - i'm not sure whether ObjectID can always be safely used to order by for latest version. Using timestamp may be more certain approach.
I was typing the same as Daimon's first answer, using sort and limit. This is probably not recommended, especially with some drivers (which use random numbers instead of increments for the least significant portion), because of the way the _id is generated. It has second [as opposed to something smaller, like millisecond] resolution as the most significant portion, but the last number can be a random number. So if you had a user save twice in a second (probably not likely, but worth noting), you might end up with a slightly out of order latest document.
See http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification for more details on the structure of the ObjectID.
I would recommend adding an explicit versionNumber field to your documents, so you can query in a similar fashion using that field, like so:
db.coll.find({documentId: <id>}).sort({versionNum: -1}).limit(1);
edit to answer question in comments
You can store a regular DateTime directly in MongoDB, but it will only store the milliseconds precision in a "DateTime" format in MongoDB. If that's good enough, it's simpler to do.
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow);
coll.Insert (doc);
doc = coll.FindOne();
// see it doesn't have precision...
Console.WriteLine(doc.GetValue("dt").AsUniversalTime.Ticks);
If you want .NET DateTime (ticks)/Timestamp precision, you can do a bunch of casts to get it to work, like:
BsonDocument doc = new BsonDocument("dt", new BsonTimestamp(DateTime.UtcNow.Ticks));
coll.Insert (doc);
doc = coll.FindOne();
// see it does have precision
Console.WriteLine(new DateTime(doc.GetValue("dt").AsBsonTimestamp.Value).Ticks);
update again!
Looks like the real use for BsonTimestamp is to generate unique timestamps within a second resolution. So, you're not really supposed to abuse them as I have in the last few lines of code, and it actually will probably screw up the ordering of results. If you need to store the DateTime at a Tick (100 nanosecond) resolution, you probably should just store the 64-bit int "ticks", which will be sortable in mongodb, and then wrap it in a DateTime after you pull it out of the database again, like so:
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow.Ticks);
coll.Insert (doc);
doc = coll.FindOne();
DateTime dt = new DateTime(doc.GetValue("dt").AsInt64);
// see it does have precision
Console.WriteLine(dt.Ticks);