Bulk insert using mongo through Lambda timeout and insert data 3 times - mongodb

We have need to insert data into mongo from S3. We wrote a Lambda function which simply read file from S3 (JSON File) and using Mongoose just execute InsertMany. When we execute this lambda. Our mongodb insert take round about 7-10 minutes for 10K records. I need help on following
Improve mongo Insert so we can insert 20k records < 5 minutes to avoid lambda timeout
I am already using Ordered:False to expedite insert in mongo

Use native mongoDB client instead of Mongoose solves the problem. Seems like InsertMany perform better with native mongoDB compare to mongoose

Related

mongolite doesn't read the index and read queries are slow

I have created a mongodb database using mongolite and I create index on the _row key on the database using following command:
collection$index(add = '{"_row" : 1}')
when I query a document via Robo3T program with the db.getCollection('collection').find({"_row": "ENSG00000197616"}) command, my index works and it takes less than a second to query the data.
Robo3T screen shot >>> pay attention to the query time
This is also the case when I query the data using pymongo package in python.
python screenshot >>> pay attention to query time
Surprisingly, when I perform the same query with mongolite, it takes more than 10 seconds to query data:
system.time(collection$find(query = '{"_row": "ENSG00000197616"}'))
user system elapsed
12.221 0.005 12.269
I think this can only come from mongolite package, otherwise, it wouldn't work on the other programs as well.
Any input is highly appreciated!
I found the solution here:
https://github.com/jeroen/mongolite/issues/37
The time consuming part is not data query but simplifying it in a dataframe.

Bulk insert in Mongodb and Postgresql

I am doing a comparison on Postgresql vs Mongodb on insert.I am using the copy command (from file to table) in Postgresql and i am wondering what is the equivalent command in mongodb.I think that the answer is the insert_many(dict)But i want be sure.Any advice would be appreciated.Thanks in advance!
If you want to insert multiple documents in mongodb, you can use
db.collectionName.insertMany([{title: 'doc 1'},{title: 'doc2'},......]}

Performance degrade with Mongo when using bulkwrite with upsert

I am using Mongo java driver 3.11.1 and Mongo Version 4.2.0 for my development.I am still learning mongo. My application receives data and either have to do insert or replace the existing document i.e. do an upsert.
Each document size is 780-1000 bytes as of now and each collection can have more than 3 millions records.
Approach 1: I tried using findOneandreplace for each document and it was taking more than 15 minutes to save the data.
Approach-2 I changed it to bulkwrite using below, which resulted in ~6-7 minutes for saving 20000 records.
List<Data> dataList;
dataList.forEach(data-> {
Document updatedDocument = new Document(data.getFields());
updates.add(new ReplaceOneModel(eq("DataId", data.getId()), updatedDocument, updateOptions));
});
final BulkWriteResult bulkWriteResult = mongoCollection.bulkWrite(updates);
3) I tried using collection.insertMany which takes 2 seconds to store the data.
As per driver code, insertMany also Internally InsertMany uses MixedBulkWriteOperation for inserting the data similar to bulkWrite.
My queries are -
a) I have to do upsert operation, Please let me know where i am doing any mistakes.
- Created the indexes on DataId field but resulted in <2 miliiseconds difference in terms of performance.
- Tried using writeConcern of W1, but performance is still the same.
b) why insertMany's performance is faster than bulk write. I could understand in terms of few seconds difference but unable to figure out the reason for 2-3 seconds for insertMany and 5-7 minutes for bulkwrite.
c) Are there any approaches that can be used to solve this situation.
This problem was solved to greater extent by adding index on DataId Field. Previously i had created index on DataId field but forgot to create index after creating collection.
This link How to improve MongoDB insert performance helped in resolving the problem

mongodb Temporary Collection or Aggregate on find result

I have to generate a dynamic report.
I have a complex query which returns result using db.collection.find() and has millions of records in it.
Now I want to perform aggregate operation on this result.
I tried inserting into a collection and than executing aggregate function on it using below:
db.users.find().forEach( function(myDoc) { db.usersDummy.insert(myDoc); } );
But this does not seems to be feasible to temporarily insert data and then perform aggregate operation on it.
Is there any way mongoDB supports Temporary tables or perform aggregate operation directly on find result?
As suggested by JohnnyHK i am using $match in aggregation to run the huge collection instead of creating a dummy collection
So closing this question

How can I drop a mongo collection in Erlang with the mongodb-erlang driver?

How can I drop a MongoDB collection in Erlang using the mongodb-erlang driver (https://github.com/mongodb/mongodb-erlang)?
I didn't find anything in the docs: http://api.mongodb.org/erlang/mongodb/
I'm writing tests that create collections with different names and I want to drop them when the tests are finished. I can delete all the documents in a collection, but I want to drop the collection itself.
Use the mongo_query:command/3 function and the document form of the drop command:
1> mongo_query:command({Db, Conn}, {drop, 'foo.bar.baz'}, false).
{nIndexesWas,1.0,msg,<<"indexes dropped for collection">>,
ns,<<"foo.bar.baz">>,
ok,1.0}
Takes a regular connection, not a replset connection.
mongo_query:command/3 function:
http://api.mongodb.org/erlang/mongodb/mongo_query.html#command-3
Document form of MongoDB drop command function:
http://docs.mongodb.org/manual/reference/command/drop/#dbcmd.drop