MongoDB FindAndModify extremely slow - mongodb

i'm using mongodb and have some trouble with the speed. My collections got bigger and now contains about 7.000.000 items. As a result, the findAndModify query takes about 3 seconds. I have a index on the queried field (in my case "links", which is an array). Does anybody see a big failure or inefficient code (see below).
public Cluster findAndLockWithUpsert(String url, String lockid) {
Query query = Query.query(Criteria.where("links").in(Arrays.asList(url)));
Update update = new Update().push("lock", lockid).push("links", url);
FindAndModifyOptions options = new FindAndModifyOptions();
options.remove(false);
options.returnNew(true);
options.upsert(true);
Cluster result = mongo.findAndModify(query, update, options, Cluster.class, COLLECTION);
return result;
}
thank you in advance!

Related

Getting previous page's results when using skip and limit while querying DocumentDB

I have a collection with ~800,000 documents, I am trying to fetch all of them, 5,000 at a time.
When running the following code:
const CHUNK_SIZE = 5000;
let skip = 0;
do {
matches = await dbClient
.collection(collectionName)
.find({})
.skip(skip)
.limit(CHUNK_SIZE)
.toArray();
// ... some processing
skip += CHUNK_SIZE;
} while (matches.length)
After about 30 iterations, I start getting documents I already received in a previous iteration.
What am I missing here?
As posted in the comments, you'll have to apply a .sort() on the query.
To do so without adding too much performance overhead it would be easiest to do this on the _id e.g.
.sort(
{
"_id" : 1.0
}
)
Neither MongoDB or the AmazonDocumentDB flavor guarantees implicit result sort ordering without it.
Amazon DocumentDB
Mongo Result Ordering

Iterating over MongoDB collection to duplicate all documents is painfully slow

I have a collection of 7,000,000 documents (each of perhaps 1-2 KB BSON) in a MongoDB collection that I would like to duplicate, modifying one field. The field is a string with a numeric value, and I would like to increment the field by 1.
Following this approach From the Mongo shell, I took the following approach:
> var all = db.my_collection.find()
> all.forEach(function(it) {
... it._id = 0; // to force mongo to create a new objectId
... it.field = (parseInt(it.field) + 1).toString();
... db.my_collection.insert(it);
... })
Executing the following code is taking an extremely long time; at first I thought the code was broken somehow, but from a separate terminal I checked the status of the collection something like an hour later to find the process was still running and there was now 7,000,001 documents! I checked to find that sure enough, there was exactly 1 new document that matched the incremented field.
For context, I'm running a 2015 MBP with 4 cores and 16 GB ram. I see mongo near the top of my CPU overhead averaging about 85%.
1) Am I missing a bulk modify/update capability in Mongodb?
2) Any reason why the above operation would be working, yet working so slowly that it is updating a document at a rate of 1/hr?
Try the db.collection.mapReduce() way:
NB: A single emit can only hold half of MongoDB’s maximum BSON document size.
var mapFunction1 = function() {
emit(ObjectId(), (parseInt(this.field) + 1).toString());
};
MongoDB will not call the reduce function for a key that has only a single value.
var reduceFunction1 = function(id, field) {
return field;
};
Finally,
db.my_collection.mapReduce(
mapFunction1,
reduceFunction1.
{"out":"my_collection"} //Replaces the entire content; consider merge
)
I'm embarrassed to say that I was mistaken that this line:
... it._id = 0; // to force mongo to create a new objectId
Does indeed force mongo to create a new ObjectId. Instead I needed to be explicit:
... it._id = ObjectId();

Faster way to get data out of mongo results than field-by-field?

Is there a faster way to parse mongo results back into datastructures than doing it field-by-field as below? I'm currently doing this and it's very slow.
mongocxx::cursor cursor = m_coll.find(<some-query>);
for (bsoncxx::document::view doc : cursor)
{
RecreatedObj readback;
readback.Area = doc["Area"].get_int32();
readback.fieldOne = doc["fieldOne"].get_double();
readback.fieldTwo = doc["fieldTwo"].get_double();
readback.fieldThree = doc["fieldThree"].get_int32();
...
}
Field lookup is O(N), so you've unwittingly written an O(N^2) algorithm.
A faster way would be to iterate over the fields in the document view and do your assignments in a switch/case on field name.

how to call count operation after find with mongodb java driver

I am using MongoDB 3.0. suppose there is a set of documents named photos, its structure is
{"_id" : 1, photographer: "jack"}
with database.getCollection("photos"), Mongodb will return a MongoCollection object, on which I have the method count() to get the number documents returned.
However, when I make queries with specific conditions. For example find documents with id smaller than 100 :
photosCollections.find(Document.parse("{_id : {$lt : 100}}"))
Above find method will always return a cursor which doesn't provide a count() function. So how can I know how many documents returned ? I know on command line, I can use
db.photos.find({_id : {$lt : 100}}).count()
Of course, I can go through the iterator and count the number of documents myself. However I find it really clumsy. I am wondering does MongoDB java driver provides such functionality to count the number of documents returned by the find() method ? If not, what is the reason behind the decision ?
As you said the MongoCollection has the count() method that will return the number of documents in the collection, but it has also a count(Bson filter) that will return the number of documents in the collection according to the given options.
So you can just use:
long count = photosCollections.count(Document.parse("{_id : {$lt : 100}}"))
or maybe clearer:
Document query = new Document("_id", new Document("$lt", 100));
long count = photosCollections.count(query);
ref: http://api.mongodb.com/java/3.3/com/mongodb/client/MongoCollection.html#count-org.bson.conversions.Bson-
In MongoDB 3.4 you can only use the Iterator of FindIterable to get the count of the documents returned by a filter. e.g.
FindIterable findIterable =
mongoCollection.find(Filters.eq("EVENT_TYPE", "Sport"));
Iterator iterator = findIterable.iterator();
int count = 0;
while (iterator.hasNext()) {
iterator.next();
count++;
}
System.out.println(">>>>> count = " + count);
I had a similar problem. I am using MongoCollection instead of DBCollection, as it is what it is used in MongoDG 3.2 guide. MongoCollection hasn't got count() method, so I think the only option is to use the iterator to count them.
In my case I only needed to know if any document has been returned, I am using this:
if(result.first() != null)
Bson bson = Filters.eq("type", "work");
List<Document> list = collection.find(bson).into(new ArrayList<>());
System.out.println(list.size());
into(A) (A is collection type) method iterates over all the documents and adds each to the given target. Then we can get the count of the returned documents.
The API docs clearly state that DBCursor Object provides a count method:
MongoClient client = new MongoClient(MONGOHOST,MONGOPORT);
DBCollection coll = client.getDB(DBNAME).getCollection(COLLECTION);
DBObject query = new Querybuilder().start()
.put("_id").lessThan(100).get();
DBCursor result = coll.find(query);
System.out.println("Number of pictures found: " + result.count() );

MonoDb Cshare Samus Use Collection.Count<T>(func) or shell db.collection.find(condition).count() very slow

I am sorry to trouble you about a MongoDb for Cshare driver(Samus) issue,Could you help to give a look.
using (Mongo mongo = new Mongo(config.BuildConfiguration()))
{
mongo.Connect();
try
{
var db = mongo.GetDatabase("MyCollection");//Collection 's count > 500,000,000
var collection = db.GetCollection<BasicData>();
Console.WriteLine("Count by LINQ on typed collection: {0}", collection.Linq().Count(x => x.Id > 1));////error ,timeout
Console.WriteLine("Count by not LINQ on typed collection: {0}", collection.Count()); //no condition is ok
Console.ReadKey();
}
finally{
mongo.Disconnect();
} }
use Mongodb shell:
db.collection.find(condition).count()
or
db.collection.count(condition); //very slow
If it's also slow in the shell then it doesn't have anything to do with the C# driver.
Use explain to determine whether your query is using the index you expect. If it's not using an index, any query over 500M docs is going to take some time. In the shell:
db.collection.find(condition).explain();
To speed up your query, you'll likely need to add an index that includes the fields the query condition you're using references.