Search between two dates using Lucene.Net - date

In my Lucene.Net index, I have documents with a startDate field and an endDate field. Both fields store dates in yyyyMMdd format. How can I build a query that will return hits if today's date falls between those two dates?
startDateFieldValue < myTargetDate < endDateFieldValue
For example, if myTargetDate is 17760604, I'd want to get a document back that had a startDate field value of 10660101 and an endDate field value of 19990101.
The scenario is that I have a Lucene database with Lucene documents that represent particular building sites. Each site has a StartConstruction date and an EndConstruction date. My users will enter a specific date, and I want to find all properties that were currently under construction on that date.
Note: I'm working with Lucene.Net 1.9, a much older version, and my company can't upgrade (yet).

You can do this using a Range Query. Specifically, you can do this using a NumericRangeQuery. To do this begin by indexing your dates using a NumericField and adding them to your document like:
var df = new NumericField(Fields.AmendedDate);
df.SetIntValue(int.Parse(itemToIndex.startDate.ToString("yyyyMMdd")));
doc.Add(df);
You can make your indexing a little faster by reusing your NumericField across many documents see the documentation. With your dates all nicely indexed you are now ready to search across it. To do this we use a NumericRangeQuery:
var q = NumericRangeQuery.NewIntRange( Fields.AmendedDate,
int.Parse(SearchFrom.ToString("yyyyMMdd")),
int.Parse(SearchTo.ToString("yyyyMMdd")),
true, true);
This query can then be used to search or conjoined to an existing query like:
masterQuery.Add(q, BooleanClause.Occur.MUST);
Splitting your search in this way is a far faster proposition than using a textual term search due to the nature of how numeric fields are indexed. Also, your resolution (in this instance to day level) can be altered to give a better spread across your data (i.e. if you need to the hour, minute or second then add them to the string from most to least significant). The final point of this is that by using a query you ignore the filtering step of your search (it's a normal query, not a filter).

I'm not sure I phrased my question properly. I want to find out if a particular item was active between a start and an end date. The StartDate is stored in one Lucene field, the EndDate in another.
Here's the search snippet I used:
var searchableDate = DateTools.DateToString(dateToSearchFor, DateTools.Resolution.DAY);
var lowerRange = new RangeQuery(null, new Term("StartDate", searchableDate), true);
var upperRange = new RangeQuery(new Term("EndDate", searchableDate), null, true);
var activeTodayFilter = new BooleanQuery();
activeTodayFilter.Add(new BooleanClause(lowerRange, BooleanClause.Occur.MUST));
activeTodayFilter.Add(new BooleanClause(upperRange, BooleanClause.Occur.MUST));
return activeTodayFilter;
I found the solution in an old Lucene forum/newsgroup, but I'm afraid I don't remember the link.
If there's an easier/better way to write the query above, let me know.

You have to use a RangeQuery.
RangeQuery rq = new RangeQuery(new Term("date", "10660101"),new Term("date", "19990101") ,true);
In an up-to-date version you could use NumericFields/NumericRangeQuery for better performance.

Related

mongoDB data filter by range

I have one collection where I want to filter data based on a range.
Range will go like this:
Last week (last_week)
Last month (last_month)
Last quarter (last_quarter)
I want to take input as last_week or last_month or last_quarter and based upon the input given, I want to filter data from documents matching the criteria supplied.
Is it possible to build a criteria on the fly based upon the input given?
Yes, you can. Build your criteria like this,
query = {};
query.createdDate = {
$gt: last_week,
$lt: last_month,
};

Solr: Query for documents whose from-to date range contains the user input

I would like to store and query documents that contain a from-to date range, where the range represents an interval when the document has been valid.
Typical use cases in lucene/solr documentation address the opposite problem: Querying for documents that contain a single timestamp and this timestamp is contained in a date range provided as query parameter. (createdate:[1976-03-06T23:59:59.999Z TO *])
I want to use the edismax parser.
I have found the ms() function, which seems to me to be designed for boosting score only, not to eliminate non-matching results entirely.
I have found the article Spatial Search Tricks for People Who Don't Have Spatial Data, where the problem described by me is said to be Easy... (Find People Alive On May 25, 1977).
Is there any simpler way to express something like
date_from_query:[valid_from_field TO valid_to_field] than using the spacial approach?
The most direct approach is to create the bounds yourself:
valid_from_field:[* TO date_from_query] AND valid_to_field:[date_from_query TO *]
.. which would give you documents where the valid_from_field is earlier than the date you're querying, and the valid_to_field is later than the date you're querying, in effect, extracting the interval contained between valid_from_field and valid_to_field. This assumes that neither field is multi valued.
I'd probably add it as a filter query, since you don't need any scoring from it, and you probably want to allow other search queries at the same time.

mongo query based on field calculation

I am looking for a way to query mongo for documents matching the results between two fields when compared to a variable.
For example, overlapping date ranges. I have a document with the following schema:
{startDate : someDate, endDate : otherDate, restrictions : {daysBefore : 5, daysAfter : 5}}
My user will supply their own date range like
var userInfo = { from : Date, to : Date}
I need the documents that satisfy this condition:
startDate - restrictions.daysBefore <= userInfo.to && endDate + restrictions.daysAfter >= userInfo.from;
I tried using a $where clause, but I loose the context of the to and from since they are defined outside of the scope of the where function.
I would like to do this without pulling down all of the results, or creating another field upon insert.
Is there a simple way this query can be done?
The aggregation framework [AF] will do what you want. The AF backend is written in C++ and therefor much faster then using JavaScript as an added bonus. In addition to faster then JavaScript there are number of reasons we discourage the use of $where some of which can be found in the $where docs.
The AF docs(i.e. the good stuff to use):
http://docs.mongodb.org/manual/reference/aggregation/
I am uncertain the format of the data you are storing, and this will also have an affect on performance. For instance if the date is the standard date of milliseconds since Jan 1st 1970 (unix epoch) and daysBefore is stored in (miliseconds per day) * (number of days), you can use simple math as the example below does. This is very fast. If not there are date conversions available in the AF, but that is of course more expensive to do the conversions in addition to getting the differences.
In Python (your profile mentions Django) datetime.timedelta can be used be used for daysBefore. For instance for 5 days:
import datetime
daysBefore=datetime.timedelta(5)
There are two main ways to go about what you want to use in the AF. Do the calculation directly and match on it, or create a new column and match against that. Your specific use case and testing against will be necessary for complicated or large scale deployments. An aggregate command from the shell to match against the calculation in Python:
fromDate=<program provided>
db.collection.aggregate([{"$match":{"startDate":{ "$lt": {"$add": ["$restrictions.daysBefore", fromDate]}}}}])
If you want to run multiple calculations in the same $match use $and:[{}, {}, …, {}]. I omitted that for clarity.
Further aggregation documentation for the AF can be found at:
http://api.mongodb.org/python/current/examples/aggregation.html#aggregation-framework
Note that “aggregation” also includes map reduce in Mongo, but this case the AF should be able to do it all (and much more quickly).
If you need any further information about the AF or if there is anything the docs don’t make clear, please don’t hesitate to ask.
Best,
Charlie

How to query date saved as text in bad date format in mongoDB

I am very new to mongodb
I have a database with sale_date and the value is saved as text and the format is "dd:mm:yyyy". Now I want to query based on the date. Like I want to query the last month's entry.
I also have field sale_time and also saved as text and the format is "hh:mm" and I want to query the last hour's entry.
**I want to query from java and also from the mongo console.
One row of my collection:
{
"_id":112350,
"sale_date":"21.07.2011",
"sale_time":"18:50",
"store_id":"OK3889-45",
"region_code":45,
"product_id":"QKDGLHX5061",
"product_catagorie":53,
"no_of_product":1,
"price":1211.37,
"total_price":1211.37
}
I have million of entries. Now I want to find the entries for the month of July 2011 or hour from 18:00 to 19:00 in 21.07.2013.
You can query with a regex matching your results. You said format dd:mm:yyyy but the example looks like dd.mm.yyyy so I used that in examples
For example:
db.sales.find({sale_date: /..\.07\.2011/})
This will be ineficient since it can't use an index, but it will get the job done.
It would be better, if you stick with dates as strings to reverse the order to yyyy:mm:dd then you could use a anchored regex, which will hit an index like:
db.sales.find({sale_date: /2011\.07/})
For the hour query:
db.sales.find({sale_date: "21.07.2013", sale_time: {$gte: "18:00", $lt: "19:00"}})
There is no efficient and reliable way to query the for a date range you want given the date structure you have used. A regex style query would scan through the entire collection for example, and if you have millions of documents, that's not acceptable.
You could theoretically create a MapReduce to better structure the data into a new collection. But, that will be more work to maintain (as MapReduces aren't automatically updated, and may make other queries and data fetching involve more steps than you'd like).
Although, if you're willing to do that, I'd strongly suggest you instead just fix your data as I mentioned in my comment to be a standard YYYYMMDD. Even better, you may want to consider merging the time and would be to include the time stamp in the same field:
2013-07-21T14:30
If you don't though, you can still do the single date query reasonably (although you'd want to index both the date and time as a compound index):
db.sales.ensureIndex({ sale_date: 1, sale_time: 1})
Regarding the code, it's basically going to look like this:
BasicDBObject date = new BasicDBObject("sale_date", "21.07.2013");
BasicDBObject time = new BasicDBObject("sale_time",
new BasicDBObject("$gte", "18:00").
append("$lte", "19:00"));
BasicDBObject andQuery = new BasicDBObject();
List<BasicDBObject> obj = new ArrayList<BasicDBObject>();
obj.add(date);
obj.add(time);
andQuery.put("$and", obj);
cursor = coll.find(andQuery);

MongoDB - most efficient way of getting the last version of a document

I'm using MongoDB to hold a collection of documents.
Each document has an _id (version) which is an ObjectId. Each document has a documentId which is shared across the different versions. This too is an OjectId assigned when the first document was created.
What's the most efficient way of finding the most up-to-date version of a document given the documentId?
I.e. I want to get the record where _id = max(_id) and documentId = x
Do I need to use MapReduce?
Thanks in advance,
Sam
Add index containing both fields (documentId, _id) and don't use max (what for)? Use query with documentId = x, order DESC by _id and limit(1) results to get the latest. Remember about proper sorting order of index (DESC also)
Something like that
db.collection.find({documentId : "x"}).sort({_id : -1}).limit(1)
Other approach (more denormalized) would be to use other collecion with documents like:
{
documentId : "x",
latestVersionId : ...
}
Use of atomic operations would allow to safely update this collection. Adding proper index would make queries fast as lightning.
There is one thing to take into account - i'm not sure whether ObjectID can always be safely used to order by for latest version. Using timestamp may be more certain approach.
I was typing the same as Daimon's first answer, using sort and limit. This is probably not recommended, especially with some drivers (which use random numbers instead of increments for the least significant portion), because of the way the _id is generated. It has second [as opposed to something smaller, like millisecond] resolution as the most significant portion, but the last number can be a random number. So if you had a user save twice in a second (probably not likely, but worth noting), you might end up with a slightly out of order latest document.
See http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification for more details on the structure of the ObjectID.
I would recommend adding an explicit versionNumber field to your documents, so you can query in a similar fashion using that field, like so:
db.coll.find({documentId: <id>}).sort({versionNum: -1}).limit(1);
edit to answer question in comments
You can store a regular DateTime directly in MongoDB, but it will only store the milliseconds precision in a "DateTime" format in MongoDB. If that's good enough, it's simpler to do.
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow);
coll.Insert (doc);
doc = coll.FindOne();
// see it doesn't have precision...
Console.WriteLine(doc.GetValue("dt").AsUniversalTime.Ticks);
If you want .NET DateTime (ticks)/Timestamp precision, you can do a bunch of casts to get it to work, like:
BsonDocument doc = new BsonDocument("dt", new BsonTimestamp(DateTime.UtcNow.Ticks));
coll.Insert (doc);
doc = coll.FindOne();
// see it does have precision
Console.WriteLine(new DateTime(doc.GetValue("dt").AsBsonTimestamp.Value).Ticks);
update again!
Looks like the real use for BsonTimestamp is to generate unique timestamps within a second resolution. So, you're not really supposed to abuse them as I have in the last few lines of code, and it actually will probably screw up the ordering of results. If you need to store the DateTime at a Tick (100 nanosecond) resolution, you probably should just store the 64-bit int "ticks", which will be sortable in mongodb, and then wrap it in a DateTime after you pull it out of the database again, like so:
BsonDocument doc = new BsonDocument("dt", DateTime.UtcNow.Ticks);
coll.Insert (doc);
doc = coll.FindOne();
DateTime dt = new DateTime(doc.GetValue("dt").AsInt64);
// see it does have precision
Console.WriteLine(dt.Ticks);