Mongodb documents which dates contain non-zero timestamp - mongodb

Sorry for the title, I find it hard to explain in few words...
I have an assets collection with a date field. This field should only have the date part and the rest should be zeros, ie: ISODate("2018-07-18T00:00:00.000Z") however due to a bug I have some dates like: ISODate("2017-12-16T08:35:20.201Z").
I would like to find those offending documents and fix their dates.
Is there any way to query something like
{
date: {
$miliseconds: {
$not: 0
}
}
}

You can use the cursor from the aggregate query and bulk updates to update the matching documents in 3.6 version.
Here is the shell sample.
var bulk = db.colname.initializeUnorderedBulkOp();
var count = 0;
var batch = 50; // Change batch size as you need
db.colname.aggregate([
{$match : {$expr: {$ne:[ {$millisecond: "$date" },0]}}},
{$project:{
date:{$dateFromParts:{
year:{$year:"$date"},
month:{$month:"$date"},
day:{$dayOfMonth:"$date"}
}}
}}
]).forEach(function(doc){
bulk.find( {"_id" : doc._id}).updateOne(
{ "$set": {"date" : doc.date}}
);
count++;
if (count == batch) {
bulk.execute();
bulk = db.colname.initializeUnorderedBulkOp();
count = 0;
}
});
if (count > 0) {
bulk.execute();
}

Related

MongoDB convert string type to float type

Following the suggestions over here MongoDB: How to change the type of a field? I tried to update my collection to change the type of field and its value.
Here is the update query
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
doc.Cost = parseFloat(newCost).toFixed(2);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
upon completion of the above query, when I run the following command
db.MyCollection.find({"ProjectID" : 44},{Cost:1})
I still have Cost field as string.
{
"_id" : ObjectId("576919b66bab3bfcb9ff0915"),
"Cost" : "11531.23"
}
/* 7 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0916"),
"Cost" : "13900.64"
}
/* 8 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0917"),
"Cost" : "15000.86"
}
What am I doing wrong here?
Here is the sample document
/* 2 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0911"),
"Cost" : "$7,100.00"
}
/* 3 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0912"),
"Cost" : "$14,500.00"
}
/* 4 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0913"),
"Cost" : "$12,619.00"
}
/* 5 */
{
"_id" : ObjectId("576919b66bab3bfcb9ff0914"),
"Cost" : "$9,250.00"
}
The problem is that toFixed returns an String, not a Number. Then your are just updating the document with a new, and different String.
Example from Mongo Shell:
> number = 2.3431
2.3431
> number.toFixed(2)
2.34
> typeof number.toFixed(2)
string
If you want a 2 decimals number you must parse it again with something like:
db.MyCollection.find({"ProjectID" : 44, "Cost": {$exists: true}}).forEach(function(doc){
if(doc.Cost.length > 0){
var newCost = doc.Cost.replace(/,/g, '').replace(/\$/g, '');
var costString = parseFloat(newCost).toFixed(2);
doc.Cost = parseFloat(costString);
db.MyCollection.save(doc);
} // End of If Condition
}) // End of foreach
Follow this pattern to convert a currency field of string type to a float. You need to query all the documents in the collection that have the Cost field type string. To do so you would need to take advantage of using the Bulk API for bulk updates. These offer better performance as you will be sending the operations to the server in batches of say 1000, which gives you a better performance as you are not sending every request to the server, but just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing all the Cost fields to floating value fields:
var bulk = db.MyCollection.initializeUnorderedBulkOp(),
counter = 0;
db.MyCollection.find({
"Cost": { "$exists": true, "$type": 2 }
}).forEach(function (doc) {
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "Cost": newCost }
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations
// re-initialize every 1000 update statements
bulk = db.MyCollection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite().
It uses the same cursors as above but creates the arrays with the bulk operations using the same forEach() cursor method to push each bulk write document to the array. Because write commands can accept no more than 1000 operations, you will need to group your operations to have at most 1000 operations and re-intialise the array when loop hit the 1000 iteration:
var cursor = db.MyCollection.find({ "Cost": { "$exists": true, "$type": 2 } }),
bulkUpdateOps = [];
cursor.forEach(function(doc){
var newCost = Number(doc.Cost.replace(/[^0-9\.]+/g,""));
bulkUpdateOps.push({
"updateOne": {
"filter": { "_id": doc._id },
"update": { "$set": { "Cost": newCost } }
}
});
if (bulkUpdateOps.length == 1000) {
db.MyCollection.bulkWrite(bulkUpdateOps);
bulkUpdateOps = [];
}
});
if (bulkUpdateOps.length > 0) { db.MyCollection.bulkWrite(bulkUpdateOps); }
Since mongoDB version 4.2, It can be done entirely inside one mongoDB query using Updates with Aggregation Pipeline:
db.collection.updateMany(
{Cost: {$exists: true}},
[{$set: {
Cost: {
$toDouble: {
$reduce: {
input: {$split: [{$substr: ["$Cost", 1, {$strLenCP: "$Cost"}]}, ","]},
initialValue: "",
in: {$concat: ["$$value", "$$this"]}
}
}
}
}}]
)
See how it works on the playground example

How can convert string to date with mongo aggregation?

In a collection, I store this kind of document
{
"_id" : 1,
"created_at" : "2016/01/01 12:10:10",
...
}.
{
"_id" : 2,
"created_at" : "2016/01/04 12:10:10",
...
}
I would like to find documents have "creared_at" > 2016/01/01 by using aggregation pipeline.
Anybody have solution to convert "created_at" to date so can conpare in aggregation?
All the above answers use cursors but however, mongodb always recommend to use aggregation pipeline. With the new $dateFromString in mongodb 3.6, its pretty much simple.
https://docs.mongodb.com/manual/reference/operator/aggregation/dateFromString/
db.collection.aggregate([
{$project:{ created_at:{$dateFromString:{dateString:'$created_at'}}}}
])
As you have mentioned, you need to first change your schema so that the created_at field holds date objects as opposed to string as is the current situation, then you can query your collection either using the find() method or the aggregation framework. The former would be the most simple approach.
To convert created_at to date field, you would need to iterate the cursor returned by the find() method using the forEach() method, within the loop convert the created_at field to a Date object and then update the field using the $set operator.
Take advantage of using the Bulk API for bulk updates which offer better performance as you will be sending the operations to the server in batches of say 1000 which gives you a better performance as you are not sending every request to the server, just once in every 1000 requests.
The following demonstrates this approach, the first example uses the Bulk API available in MongoDB versions >= 2.6 and < 3.2. It updates all
the documents in the collection by changing the created_at fields to date fields:
var bulk = db.collection.initializeUnorderedBulkOp(),
counter = 0;
db.collection.find({"created_at": {"$exists": true, "$type": 2 }}).forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "created_at": newDate}
});
counter++;
if (counter % 1000 == 0) {
bulk.execute(); // Execute per 1000 operations and re-initialize every 1000 update statements
bulk = db.collection.initializeUnorderedBulkOp();
}
})
// Clean up remaining operations in queue
if (counter % 1000 != 0) { bulk.execute(); }
The next example applies to the new MongoDB version 3.2 which has since deprecated the Bulk API and provided a newer set of apis using bulkWrite():
var cursor = db.collection.find({"created_at": {"$exists": true, "$type": 2 }}),
bulkOps = [];
cursor.forEach(function (doc) {
var newDate = new Date(doc.created_at);
bulkOps.push(
{
"updateOne": {
"filter": { "_id": doc._id } ,
"update": { "$set": { "created_at": newDate } }
}
}
);
if (bulkOps.length === 1000) {
db.collection.bulkWrite(bulkOps);
bulkOps = [];
}
});
if (bulkOps.length > 0) { db.collection.bulkWrite(bulkOps); }
Once the schema modification is complete, you can then query your collection for the date:
var dt = new Date("2016/01/01");
db.collection.find({ "created_at": { "$gt": dt } });
And should you wish to query using the aggregation framework, run the following pipeline to get the desired result. It uses the $match operator, which is similar to the find() method:
var dt = new Date("2016/01/01");
db.collection.aggregate([
{
"$match": { "created_at": { "$gt": dt } }
}
])
If we have documents:
db.doc.save({ "_id" : 1, "created_at" : "2016/01/01 12:10:10" })
db.doc.save({ "_id" : 2, "created_at" : "2016/01/04 12:10:10" })
Simple query:
db.doc.find({ "created_at" : {"$lte": Date()} })
Aggregate query:
db.doc.aggregate([{
"$match": { "created_at": { "$lte": Date() } }
}])
Date() method which returns the current date as a string.
new Date() constructor which returns a Date object using the ISODate() wrapper.
ISODate() constructor which returns a Date object using the ISODate()
wrapper.
More information about the type of date here and here

Mongodb aggregation $unwind then count

Here is my problem : in my Mongo database, I have a collection with items like :
{
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
I would like to get the total number of steps 'start'.
I tryed to use the MongoDB aggregation framework : I use $unwind on steps.action and $match on steps.action to match 'start'.
However, I get too much data and reach the aggregation's limit :
exception: aggregation result exceeds maximum document size (16MB). I don't need the data, I just want the count, but I couldn't find how to do it (tryed with $group without success).
Thanks in advance,
If you want the count you can use this
db.test.count({"steps.action":"start"})
but this will not take into account if steps contain multiple steps with action start.
When you also need to count all steps with start then you need to unwind the array, make a match on steps.action and then group the results to count.
db.test.aggregate([{$unwind:"$steps"}, {$match:{"steps.action":"start"}},{ $group: { _id: null, count: { $sum: 1 } } }])
try this
db.collection.aggregate([
{ $unwind : "$steps" },
{$match:{'steps.action':'start'}},
{$group:{_id:null,count:{$sum:1}}}
]).pretty()
In mongodb, aggregation framework, the pipeline stages have maximum of 100MB size restriction,while the result it provide that is either a BSON file or a collection document has a maximum size of 16MB
So you can $match on require condition only and $group it so that only the required result is output that is less than 16MB.
You may not need aggregation for this simple query. See below code.
for (var i = 10000; i >= 0; i--) {
var a = {
'id': 1,
'steps': [
{
action: 'start',
info: 'foo'
},
{
action: 'stop',
info: 'bar'
}
]
}
a.id = i;
var rnd = Math.floor((Math.random() * 3) + 1);
if (rnd == 1)
{
a.steps[0].action = 'none';
}
if (rnd == 2)
{
a.steps.push({ action: 'start', info: 'foo' })
}
db.obj.insert(a);
};
This code creates random number of actions.
If you need only number of documents which contains action: 'start' then below query.
db.obj.count({"steps.action":"start"})
I get following count in my run.
> db.obj.count({"steps.action":"start"})
6756
But if you need number of {action: 'start'} in the documents then aggregation query needed.
You unwind then match
db.obj.aggregate(
[
{ $unwind : "$steps"},
{$match: { "steps.action" : "start" }},
{
$group:
{
_id: null
,count: { $sum: 1 }
}
}
]
)
This outputs:
{ "_id" : null, "count" : 10054 }
if you get your exception again use **allowDiskUse : true** option. See [here][1].
db.obj.aggregate(
[
....
]
,
{
allowDiskUse : true
}
)

"too much data for sort()" on a small collection

When trying to do a find and sort on a mongodb collection I get the error below. The collection is not large at all - I have only 28 documents and I start getting this error when I cross the limit of 23 records.
The special thing about that document is that it holds a large ArrayCollection inside but I am not fetching that specific field at all, I am only trying to get a DateTime field.
db.ANEpisodeBreakdown.find({creationDate: {$exists:true}}, {creationDate: true} ).limit(23).sort( { creationDate: 1}
{ "$err" : "too much data for sort() with no index. add an index or specify a smaller limit", "code" : 10128 }
So the problem here is a 32MB limit and you have no index that can be used as an "index only" or "covered" query to get to the result. Without that, your "big field" still gets loaded in the data to sort.
Easy to replicate;
var string = "";
for ( var n=0; n < 10000000; n++ ) {
string += 0;
}
for ( var x=0; x < 4; x++ ) {
db.large.insert({ "large": string, "date": new Date() });
sleep(1000);
}
So this query will blow up, unless you limit to 3:
db.large.find({},{ "date": 1 }).sort({ "date": -1 })
To overcome this:
Create an index on date (and other used fields) so the whole document is not loaded in your covered index query:
db.large.ensureIndex({ "date": 1 })
db.large.find({},{ "_id": 0, "date": 1 }).sort({ "date": -1 })
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Don't index and use aggregate instead, as the $project there does not suffer the same limitations as the document actually gets altered before passing to $sort.
db.large.aggregate([
{ "$project": { "_id": 0, "date": 1 }},
{ "$sort": {"date": -1 }}
])
{ "date" : ISODate("2014-07-07T10:08:33.067Z") }
{ "date" : ISODate("2014-07-07T10:08:31.747Z") }
{ "date" : ISODate("2014-07-07T10:08:30.391Z") }
{ "date" : ISODate("2014-07-07T10:08:29.038Z") }
Either way gets you the results under the limit without modifying cursor limits in any way.
Without an index, the size you can use for a sort only extends over shellBatchSize which by default is 20.
DBQuery.shellBatchSize = 23;
This should do the trick.
The problem is that projection in this particular scenario still loads the entire document, it just sends it to your application without the large array field.
As such MongoDB is still sorting with too much data for its 32mb limit.

How to limit number of updating documents in mongodb

How to implement somethings similar to db.collection.find().limit(10) but while updating documents?
Now I'm using something really crappy like getting documents with db.collection.find().limit() and then updating them.
In general I wanna to return given number of records and change one field in each of them.
Thanks.
You can use:
db.collection.find().limit(NUMBER_OF_ITEMS_YOU_WANT_TO_UPDATE).forEach(
function (e) {
e.fieldToChange = "blah";
....
db.collection.save(e);
}
);
(Credits for forEach code: MongoDB: Updating documents using data from the same document)
What this will do is only change the number of entries you specify. So if you want to add a field called "newField" with value 1 to only half of your entries inside "collection", for example, you can put in
db.collection.find().limit(db.collection.count() / 2).forEach(
function (e) {
e.newField = 1;
db.collection.save(e);
}
);
If you then want to make the other half also have "newField" but with value 2, you can do an update with the condition that newField doesn't exist:
db.collection.update( { newField : { $exists : false } }, { $set : { newField : 2 } }, {multi : true} );
Using forEach to individually update each document is slow. You can update the documents in bulk using
ids = db.collection.find(<condition>).limit(<limit>).map(
function(doc) {
return doc._id;
}
);
db.collection.updateMany({_id: {$in: ids}}, <update>})
The solutions that iterate over all objects then update them individually are very slow.
Retrieving them all then updating simultaneously using $in is more efficient.
ids = People.where(firstname: 'Pablo').limit(10000).only(:_id).to_a.map(&:id)
People.in(_id: ids).update_all(lastname: 'Cantero')
The query is written using Mongoid, but can be easily rewritten in Mongo Shell as well.
Unfortunately the workaround you have is the only way to do it AFAIK. There is a boolean flag multi which will either update all the matches (when true) or update the 1st match (when false).
As the answer states there is still no way to limit the number of documents to update (or delete) to a value > 1. A workaround to use something like:
db.collection.find(<condition>).limit(<limit>).forEach(function(doc){db.collection.update({_id:doc._id},{<your update>})})
If your id is a sequence number and not an ObjectId you can do this in a for loop:
let batchSize= 10;
for (let i = 0; i <= 1000000; i += batchSize) {
db.collection.update({$and :[{"_id": {$lte: i+batchSize}}, {"_id": {$gt: i}}]}),{<your update>})
}
let fetchStandby = await db.model.distinct("key",{});
fetchStandby = fetchStandby.slice(0, no_of_docs_to_be_updated)
let fetch = await db.model.updateMany({
key: { $in: fetchStandby }
}, {
$set:{"qc.status": "pending"}
})
I also recently wanted something like this. I think querying for a long list of _id just to update in an $in is perhaps slow too, so I tried to use an aggregation+merge
while (true) {
const record = db.records.findOne({ isArchived: false }, {_id: 1})
if (!record) {
print("No more records")
break
}
db.records.aggregate([
{ $match: { isArchived: false } },
{ $limit: 100 },
{
$project: {
_id: 1,
isArchived: {
$literal: true
},
updatedAt: {
$literal: new Date()
}
}
},
{
$merge: {
into: "records",
on: "_id",
whenMatched: "merge"
}
}
])
print("Done update")
}
But feel free to comment if this is better or worse that a bulk update with $in.