How can convert string to date with mongo aggregation? - mongodb

In a collection, I store this kind of document
{
"_id" : 1,
"created_at" : "2016/01/01 12:10:10",
...
}.
{
"_id" : 2,
"created_at" : "2016/01/04 12:10:10",
...
}
I would like to find documents have "creared_at" > 2016/01/01 by using aggregation pipeline.
Anybody have solution to convert "created_at" to date so can conpare in aggregation?

All the above answers use cursors but however, mongodb always recommend to use aggregation pipeline. With the new $dateFromString in mongodb 3.6, its pretty much simple.
https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/
db.collection.aggregate([
{$project:{ created_at:{$dateFromString:{dateString:'$created_at'}}}}
])

As you have mentioned, you need to first change your schema so that the created_at field holds date objects as opposed to string as is the current situation, then you can query your collection either using the find() method or the aggregation framework. The former would be the most simple approach.
To convert created_at to date field, you would need to iterate the cursor returned by the find() method using the forEach() method, within the loop convert the created_at field to a Date object and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }}),
bulkOps = [];
cursor.forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
if (bulkOps.length === 1000) {
db.collection.bulkWrite(bulkOps);
bulkOps = [];
}
});
if (bulkOps.length > 0) { db.collection.bulkWrite(bulkOps); }
Once the schema modification is complete, you can then query your collection for the date:
var dt = new Date("2016/01/01");
db.collection.find({ "created_at": { "$gt": dt } });
And should you wish to query using the aggregation framework, run the following pipeline to get the desired result. It uses the $match operator, which is similar to the find() method:
var dt = new Date("2016/01/01");
db.collection.aggregate([
{
"$match": { "created_at": { "$gt": dt } }
}
])

If we have documents:
db.doc.save({ "_id" : 1, "created_at" : "2016/01/01 12:10:10" })
db.doc.save({ "_id" : 2, "created_at" : "2016/01/04 12:10:10" })
Simple query:
db.doc.find({ "created_at" : {"$lte": Date()} })
Aggregate query:
db.doc.aggregate([{
"$match": { "created_at": { "$lte": Date() } }
}])
Date() method which returns the current date as a string.
new Date() constructor which returns a Date object using the ISODate() wrapper.
ISODate() constructor which returns a Date object using the ISODate()
wrapper.
More information about the type of date here and here

Related

Mongodb documents which dates contain non-zero timestamp

Sorry for the title, I find it hard to explain in few words...
I have an assets collection with a date field. This field should only have the date part and the rest should be zeros, ie: ISODate("2018-07-18T00:00:00.000Z") however due to a bug I have some dates like: ISODate("2017-12-16T08:35:20.201Z").
I would like to find those offending documents and fix their dates.
Is there any way to query something like
{
date: {
$miliseconds: {
$not: 0
}
}
}
You can use the cursor from the aggregate query and bulk updates to update the matching documents in 3.6 version.
Here is the shell sample.
var bulk = db.colname.initializeUnorderedBulkOp();
var count = 0;
var batch = 50; // Change batch size as you need
db.colname.aggregate([
{$match : {$expr: {$ne:[ {$millisecond: "$date" },0]}}},
{$project:{
date:{$dateFromParts:{
year:{$year:"$date"},
month:{$month:"$date"},
day:{$dayOfMonth:"$date"}
}}
}}
]).forEach(function(doc){
bulk.find( {"_id" : doc._id}).updateOne(
{ "$set": {"date" : doc.date}}
);
count++;
if (count == batch) {
bulk.execute();
bulk = db.colname.initializeUnorderedBulkOp();
count = 0;
}
});
if (count > 0) {
bulk.execute();
}

Using $sum on a existent field returns a value of 0 [duplicate]

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.

MongoDB convert string type to float type

Following the suggestions over here MongoDB: How to change the type of a field? I tried to update my collection to change the type of field and its value.
Here is the update query
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
doc.Cost = parseFloat(newCost).toFixed(2);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
upon completion of the above query, when I run the following command
db.MyCollection.find({"ProjectID" : 44},{Cost:1})
I still have Cost field as string.
{
"_id" : ObjectId("576919b66bab3bfcb9ff0915"),
"Cost" : "11531.23"
}
/* 7 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0916"),
"Cost" : "13900.64"
}
/* 8 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0917"),
"Cost" : "15000.86"
}
What am I doing wrong here?
Here is the sample document
/* 2 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0911"),
"Cost" : "$7,100.00"
}
/* 3 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0912"),
"Cost" : "$14,500.00"
}
/* 4 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0913"),
"Cost" : "$12,619.00"
}
/* 5 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0914"),
"Cost" : "$9,250.00"
}
The problem is that toFixed returns an String, not a Number. Then your are just updating the document with a new, and different String.
Example from Mongo Shell:
> number = 2.3431
2.3431
> number.toFixed(2)
2.34
> typeof number.toFixed(2)
string
If you want a 2 decimals number you must parse it again with something like:
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
var costString = parseFloat(newCost).toFixed(2);
doc.Cost = parseFloat(costString);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
Follow this pattern to convert a currency field of string type to a float. You need to query all the documents in the collection that have the Cost field type string. To do so you would need to take advantage of using the Bulk API for bulk updates. These offer better performance as you will be sending the operations to the server in batches of say 1000, which gives you a better performance as you are not sending every request to the server, but just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing all the Cost fields to floating value fields:
var bulk = db.MyCollection.initializeUnorderedBulkOp(),
counter = 0;
db.MyCollection.find({
"Cost": { "$exists": true, "$type": 2 }
}).forEach(function (doc) {
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "Cost": newCost }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations
// re-initialize every 1000 update statements
bulk = db.MyCollection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite().
It uses the same cursors as above but creates the arrays with the bulk operations using the same forEach() cursor method to push each bulk write document to the array. Because write commands can accept no more than 1000 operations, you will need to group your operations to have at most 1000 operations and re-intialise the array when loop hit the 1000 iteration:
var cursor = db.MyCollection.find({ "Cost": { "$exists": true, "$type": 2 } }),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "Cost": newCost } }
}
});
if (bulkUpdateOps.length == 1000) {
db.MyCollection.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.MyCollection.bulkWrite(bulkUpdateOps); }
Since mongoDB version 4.2, It can be done entirely inside one mongoDB query using Updates with Aggregation Pipeline:
db.collection.updateMany(
{Cost: {$exists: true}},
[{$set: {
Cost: {
$toDouble: {
$reduce: {
input: {$split: [{$substr: ["$Cost", 1, {$strLenCP: "$Cost"}]}, ","]},
initialValue: "",
in: {$concat: ["$$value", "$$this"]}
}
}
}
}}]
)
See how it works on the playground example

Mongo : How to convert all entries using a long timeStamp to an ISODate?

I have a current Mongo database with the accumulated entries/fields
{
name: "Fred Flintstone",
age : 34,
timeStamp : NumberLong(14283454353543)
}
{
name: "Wilma Flintstone",
age : 33,
timeStamp : NumberLong(14283454359453)
}
And so on...
Question : I want to convert all entries in the database to their corresponding ISODate instead - How does one do this?
Desired Result :
{
name: "Fred Flintstone",
age : 34,
timeStamp : ISODate("2015-07-20T14:50:32.389Z")
}
{
name: "Wilma Flintstone",
age : 33,
timeStamp : ISODate("2015-07-20T14:50:32.389Z")
}
Things I've tried
>db.myCollection.find().forEach(function (document) {
document["timestamp"] = new Date(document["timestamp"])
//Not sure how to update this document from here
db.myCollection.update(document) //?
})
Using the aggregation pipeline for update operations, simply run the following update operation:
db.myCollection.updateMany(
{ },
[
{ $set: {
timeStamp: {
$toDate: '$timeStamp'
}
} },
]
])
With you initial attempt, you were almost there, you just need to call the save() method on the modified document to update it since the method uses either the insert or the update command. In the above instance, the document contains an _id fieldand thus the save() method is equivalent to an update() operation with the upsert option set to true and the query predicate on the _id field:
db.myCollection.find().snapshot().forEach(function (document) {
document["timestamp"] = new Date(document["timestamp"]);
db.myCollection.save(document)
})
The above is similar to explicitly calling the update() method as you had previously attempted:
db.myCollection.find().snapshot().forEach(function (document) {
var date = new Date(document["timestamp"]);
var query = { "_id": document["_id"] }, /* query predicate */
update = { /* update document */
"$set": { "timestamp": date }
},
options = { "upsert": true };
db.myCollection.update(query, update, options);
})
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.myCollection.initializeUnorderedBulkOp(),
counter = 0;
db.myCollection.find({"timestamp": {"$not": {"$type": 9 }}}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "timestamp": new Date(doc.timestamp") }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.myCollection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.myCollection.find({"timestamp": {"$not": {"$type": 9 }}});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "timestamp": new Date(doc.timestamp") } }
}
});
if (ops.length === 1000) {
db.myCollection.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.myCollection.bulkWrite(ops);
It seems that there are some cumbersome things happening in mongo when trying to instantiate Date objects from NumberLong values. Mainly becasue the NumberLong values are converted to wrong representations and the fallback to current date is used.
I was fighting 2 days with mongo and finally I found the solution. The key is to convert NumberLong to Double ... and pass double values to Date constructor.
Here is the solution that uses bulb operations and work for me ...
(lastIndexedTimestamp is the collection field that is migrated to ISODate and stored in lastIndexed field. A temporary collection is created, and it is renamed to the original value in the end.)
db.annotation.aggregate( [
{ $project: {
_id: 1,
lastIndexedTimestamp: 1,
lastIndexed: { $add: [new Date(0), {$add: ["$lastIndexedTimestamp", 0]}]}
}
},
{ $out : "annotation_new" }
])
//drop annotation collection
db.annotation.drop();
//rename annotation_new to annotation
db.annotation_new.renameCollection("annotation");

mongodb aggregate query isn't returning proper sum on using $sum

I have a collection students with documents in the following format:-
{
_id:"53fe74a866455060e003c2db",
name:"sam",
subject:"maths",
marks:"77"
}
{
_id:"53fe79cbef038fee879263d2",
name:"ryan",
subject:"bio",
marks:"82"
}
{
_id:"53fe74a866456060e003c2de",
name:"tony",
subject:"maths",
marks:"86"
}
I want to get the count of total marks of all the students with subject = "maths". So I should get 163 as sum.
db.students.aggregate([{ $match : { subject : "maths" } },
{ "$group" : { _id : "$subject", totalMarks : { $sum : "$marks" } } }])
Now I should get the following result-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":163}], "ok":1}
But I get-
{"result":[{"_id":"53fe74a866455060e003c2db", "totalMarks":0}], "ok":1}
Can someone point out what I might be doing wrong here?
Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.
Option 1: Update Schema (Change Data Type)
The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb's cursor find(), forEach() and update() methods to change your marks schema:
db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
db.student.update(
{ "_id": doc._id, "marks": { "$type": 2 } },
{ "$set": { "marks": parseInt(doc.marks) } }
);
});
For relatively large collection sizes, your db performance will be slow and it's recommended to use mongo bulk updates for this:
MongoDB versions >= 2.6 and < 3.2:
var bulk = db.student.initializeUnorderedBulkOp(),
counter = 0;
db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "marks": parseInt(doc.marks) }
});
counter++;
if (counter % 1000 === 0) {
// Execute per 1000 operations
bulk.execute();
// re-initialize every 1000 update statements
bulk = db.student.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute();
MongoDB version 3.2 and newer:
var ops = [],
cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});
cursor.forEach(function (doc) {
ops.push({
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "marks": parseInt(doc.marks) } }
}
});
if (ops.length === 1000) {
db.student.bulkWrite(ops);
ops = [];
}
});
if (ops.length > 0) db.student.bulkWrite(ops);
Option 2: Run MapReduce
The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().
In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:
var mapper = function () {
var x = parseInt(this.marks);
emit(this.subject, x);
};
Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.
var reducer = function(keySubject, valuesMarks) {
return Array.sum(valuesMarks);
};
db.student.mapReduce(
mapper,
reducer,
{
out : "example_results",
query: { subject : "maths" }
}
);
With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:
/* 0 */
{
"_id" : "maths",
"value" : 163
}
Possible causes your sum is being returned 0 are :
The field you are summing up is not an integer but a string.
Make sure the field contains numeric values.
You are using wrong syntax of $sum.
db.c1.aggregate([{
$group: {
_id: "$item",
price: {
$sum: "$price"
},
count: {
$sum: 1
}
}
}])
Make sure you use "$price" and not "price".
One of the most silly mistake due to which this error occurs is:
Use of space or tab inside the quotes while specifying field name.
Example - "$price " won't work !!! But, "$price" would work.