MongoDB time is in stored as a string, how to filter on the parameter? - mongodb

there is a data set where unfortunately time is not stored as datetime ISO format, but as a string, something like
{"time" : "2015-08-28 09:24:30"}
Is there a way to filter records based on this variable time?
Changing all data to timestamp is one of the right way, , but is there a way to do without it?

So the "real" anwer here is "don't do it", as converting your "strings" to a "BSON date" is a very trival process. Best done in the mongodb shell as a "one off" operation:
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "time": { "$type": 2 } }).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "time": new Date( doc.time.replace(" ","T") ) }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
if ( count % 1000 != 0 )
bulk.execute();
Of course adjusting for "timezone" as required, but a fairly simple case anyway.
And then all "strings" are now BSON dates that you can query for a "day" for example with:
db.collection.find({
"time": { "$gte": new Date("2015-08-28"), "$lt": new Date("2015-08-29") }
})
And do so with relative ease, and no matter what your langauge is as long as the Date object passed in is supported for serialization via the driver.
But of course, as long as your strings are "lexical" ( which basically means "yyyy-mm-dd hh:mm:ss" ) then you can actually use a "range" with "string values" instead:
db.collection.find({
"time": {
"$gte": "2015-08-28 00:00:00",
"$lt": "2015-08-29 00:00:00"
}
})
And it works, but it just is not "wise".
Change your "strings" to BSON Date. It takes less storage and there is no "mucking around" with working the data into a real "Date" for your language API when you actually need it as such. The work is already done.

Related

Mongodb Query to get only documents in specific days

In my mongodb table, I have 2 (relevant for this Q) columns: service, timestamp.
I want to query only rows with service=liveness and that those with timestamp of 12th Novermber 2020.
How can I do it, if timestamp field is of type Number (UNIX epoch number)..?
This is my query currently:
{ service: "liveness" }.
This is how the timestamp column looks like:
To query by two fields you only need this syntax:
db.collection.find({
"field1": yourField1Value,
"field2": yourField2Value
So, if your date is a Number instead of a Date you can try this query:
db.collection.find({
"service": "liveness",
"timestamp": 1600768437934
})
And should works. Example here.
Now, if the problem is parse 12th November 2020 to UNIX timestamp, then the easiest way is convert first the date in your app language.
Edit:
Also, I don't know if I've missunderstood your question but, here is another query.
db.collection.aggregate([
{
"$match": {
"service": "liveness",
}
},
{
"$project": {
"timestamp": {
"$toDate": "$timestamp"
}
}
},
{
"$match": {
"timestamp": {
"$gt": ISODate("1990-01-01"),
"$lt": ISODate("2060-01-01")
}
}
}
])
This query first match all documents with service as liveness, so the next stage is faster. Into $project the timestamp is parsed to Date so you can match again with your date.
Using $gt and $lt you can search by a whole day.
And also, if you can get the days into UNIX timestamp you can do this:
db.collection.find({
"service": "liveness",
"timestamp": {
"$gte": yourDay,
"$lt": nextrDay
}
})
Using $gte and $lt you ensure the query will find all values in the day.

How can convert string to date with mongo aggregation?

In a collection, I store this kind of document
{
"_id" : 1,
"created_at" : "2016/01/01 12:10:10",
...
}.
{
"_id" : 2,
"created_at" : "2016/01/04 12:10:10",
...
}
I would like to find documents have "creared_at" > 2016/01/01 by using aggregation pipeline.
Anybody have solution to convert "created_at" to date so can conpare in aggregation?
All the above answers use cursors but however, mongodb always recommend to use aggregation pipeline. With the new $dateFromString in mongodb 3.6, its pretty much simple.
https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/
db.collection.aggregate([
{$project:{ created_at:{$dateFromString:{dateString:'$created_at'}}}}
])
As you have mentioned, you need to first change your schema so that the created_at field holds date objects as opposed to string as is the current situation, then you can query your collection either using the find() method or the aggregation framework. The former would be the most simple approach.
To convert created_at to date field, you would need to iterate the cursor returned by the find() method using the forEach() method, within the loop convert the created_at field to a Date object and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }}),
bulkOps = [];
cursor.forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
if (bulkOps.length === 1000) {
db.collection.bulkWrite(bulkOps);
bulkOps = [];
}
});
if (bulkOps.length > 0) { db.collection.bulkWrite(bulkOps); }
Once the schema modification is complete, you can then query your collection for the date:
var dt = new Date("2016/01/01");
db.collection.find({ "created_at": { "$gt": dt } });
And should you wish to query using the aggregation framework, run the following pipeline to get the desired result. It uses the $match operator, which is similar to the find() method:
var dt = new Date("2016/01/01");
db.collection.aggregate([
{
"$match": { "created_at": { "$gt": dt } }
}
])
If we have documents:
db.doc.save({ "_id" : 1, "created_at" : "2016/01/01 12:10:10" })
db.doc.save({ "_id" : 2, "created_at" : "2016/01/04 12:10:10" })
Simple query:
db.doc.find({ "created_at" : {"$lte": Date()} })
Aggregate query:
db.doc.aggregate([{
"$match": { "created_at": { "$lte": Date() } }
}])
Date() method which returns the current date as a string.
new Date() constructor which returns a Date object using the ISODate() wrapper.
ISODate() constructor which returns a Date object using the ISODate()
wrapper.
More information about the type of date here and here

How to change the datatype of a value of mongodb array?

Having the following document:
{
"_id" : 1.0000000000000000,
"l" : [
114770288670819.0000000000000000,
NumberLong(10150097174480584)
]}
How can I convert the second elemento of the "l" array from NumberLong to flating-point (double) such as the first element.
Well as it seems that as other responses are either highly vague in execution or suggest writing a new collection or other "unsafe" operations, then I suppose a response that you should follow is in order.
It's acutally not a simple problem to solve ( though others have discounted it as such ) when you take in all the considerations such as "keeping the order of elements" and "not possibly overwriting other changes to the document", which are important considerations in a production system.
The only real "safe(ish)" way that I can see of doing this without possiblly (mostly) blowing away your whole array and loosing any "active" writes is with a construct like this ( using Bulk Operations for efficiency ):
var bulk = db.collection.initializeOrderedBulkOp(),
count = 0;
db.collection.find({ "l": { "$type": 18 }}).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$pull": { "l": { "$in": doc.l } }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
doc.l.map(function(l,idx) {
return { "l": parseFloat(l.valueOf()), "idx": idx }
}).forEach(function(l) {
bulk.find({ "_id": doc._id }).updateOne({
"$push": { "l": { "$each": [l.l], "$position": l.idx } }
});
count++;
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderedBulkOp();
}
});
});
if ( count % 1000 != 0 )
bulk.execute();
This is meant to "bulk" rewrite the data in your entire collection where needed, but on your single sample the result is:
{ "_id" : 1, "l" : [ 114770288670819, 10150097174480584 ] }
So the basics of that is that this will find any document in your collection that matches the current $type for a NumberLong ( 64bit integer in BSON types ) and process the results in the following way.
First remove all the elements from the array with the values of the elements already there.
Convert all elements and add them back into the array at the same position they were found in.
Now I say "safe(ish)" as while this is mostly "safe" in considering that your documents could well be being updated and having new items appended to the array, or indeed other alterations happening to the document, there is one possible problem here.
If your array "values" here are not truly "unique" then there is no real way to "pull" the items that need to be changed and also re-insert them at the same position where they originally occurred. Not in an "atomic" and "safe" way at any rate.
So the "risk" this process as shown runs, is that "if" you happened to append a new array element that has the same value as another existing element to the array ( and only when this happens between the cursor read for the document and before the bulk execution ), then you run the "risk" of that new item being removed from the array entirely.
If however all values in your array are completely "unique" per document, then this will work "perfectly", and just transpose everything back into place with converted types, and no other "risk" to missing other document updates or important actions.
It's not a simple statement, because the logic demands you take the precaution. But it cannot fail within the contraints already mentioned. And it runs on the collection you have now, without producing a new one.
load the document in the mongo shell
var doc = db.collection.find({ _id:1.0000000000000000});
explicitly convert the field to a floating point number using the parseFloat function:
doc.l[1] = parseFloat(doc.l[1]);
save the document back into the database
db.collection.save(doc);
For updating this value you should used either programming language code or some work around using java script.
Other wise use aggregation with $out which write a documents in new collection like below :
db.collectionName.aggregate({
"$unwind": "$l"
}, {
"$project": {
"l1": {
"$divide": ["$l", 1] // divide by one or multiply by one
}
}
}, {
"$group": {
"_id": "$_id"
, "l": {
"$push": "$l1"
}
}
}, {
"$out": "newCollectionName" // write data in new collection
})
If done above query then drop your old collection and used this new collection.

"too much data for sort()" on a small collection

When trying to do a find and sort on a mongodb collection I get the error below. The collection is not large at all - I have only 28 documents and I start getting this error when I cross the limit of 23 records.
The special thing about that document is that it holds a large ArrayCollection inside but I am not fetching that specific field at all, I am only trying to get a DateTime field.
db.ANEpisodeBreakdown.find({creationDate: {$exists:true}}, {creationDate: true} ).limit(23).sort( { creationDate: 1}
{ "$err" : "too much data for sort() with no index. add an index or specify a smaller limit", "code" : 10128 }
So the problem here is a 32MB limit and you have no index that can be used as an "index only" or "covered" query to get to the result. Without that, your "big field" still gets loaded in the data to sort.
Easy to replicate;
var string = "";
for ( var n=0; n < 10000000; n++ ) {
string += 0;
}
for ( var x=0; x < 4; x++ ) {
db.large.insert({ "large": string, "date": new Date() });
sleep(1000);
}
So this query will blow up, unless you limit to 3:
db.large.find({},{ "date": 1 }).sort({ "date": -1 })
To overcome this:
Create an index on date (and other used fields) so the whole document is not loaded in your covered index query:
db.large.ensureIndex({ "date": 1 })
db.large.find({},{ "_id": 0, "date": 1 }).sort({ "date": -1 })
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Don't index and use aggregate instead, as the $project there does not suffer the same limitations as the document actually gets altered before passing to $sort.
db.large.aggregate([
{ "$project": { "_id": 0, "date": 1 }},
{ "$sort": {"date": -1 }}
])
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Either way gets you the results under the limit without modifying cursor limits in any way.
Without an index, the size you can use for a sort only extends over shellBatchSize which by default is 20.
DBQuery.shellBatchSize = 23;
This should do the trick.
The problem is that projection in this particular scenario still loads the entire document, it just sends it to your application without the large array field.
As such MongoDB is still sorting with too much data for its 32mb limit.

Why are dates in match aggregate query being ignored?

I'm trying to run an aggregation statement in my mongo db. I have a document whose structure is (at least) as follows:
{
"_id": ObjectId,
"date": ISODate,
"keywordGroupId": NumberLong,
"ranking": NumberLong,
}
I would like to run an aggregation statement that aggregates the 'ranking' field for a given 'keywordGroupId' and a given 'date' interval.
I have been trying with the following aggregate command:
{
aggregate : "KeywordHistory",
pipeline : [
{ $match: { keywordGroupId: 75 , "$date": {$gte: ISODate("2013-01-01T00:00:00.0Z"), $lt: ISODate("2013-02-01T00:00:00.0Z")}} },
{ $group: { _id: { null }, count: { $sum: "$ranking" } } }
]
}
This command executes without errors and returns a result. If I try to change the value for the 'keywordGroupId' field, the command returns a different value, so I assume that the $match statement works for that field (NumberLong). Though, if I change the 'date' range and I specify a time interval for which I don't have any data in the database, it still returns a result (I would actually expect an empty result set). So I have to assume that the $match statement is ignoring the date interval specified.
Can anyone help me with this point?
Remove the $ prefix on the $date field of your $match:
{ $match: {
keywordGroupId: 75,
date: {$gte: ISODate("2013-01-01T00:00:00.0Z"), $lt: ISODate("2013-02-01T00:00:00.0Z")}
}},
You only use the $ prefix when the field name is used in a value, not as a key.
Sometimes ISodate does not works . so in Case if you want to match date using only "one" date the best way is:---
ex:-- Let a schema be:---
var storeOrder = new Schema({
store_name:{type:String, required:true},
date :{type:Date ,default:moment(new Date()).format('YYYY-MM-DD')},
orders : [{
vegetable : String,
quantity : Number,
price:Number
}]
});
mongoose.model('storeorder',storeOrder);
now to aggregate by matching date :--
storeOrder.aggregate([$match:{date :new Date("2016-12-26T00:00:00.000Z")} ])
**It is must to use new Date("2016-12-26T00:00:00.000z") instead of Date("2016-12-26T00:00:00.000z") because Date(your_date) !== new Date(your_date).
THANK YOU
The aggregate expects a Javascript Date Object and doesn't work otherwise.
new Date();
new Date(year, month, day);
Please note the month start with 0 and not 1 (Your January is 0 and December 11)