I'm trying to write on my mongo collection about 1 million rows, but it's taking too much time (actually it never ends).
Looking at mongo log I can see that the insert queries are called every time, there is no bulk operation.
Does Alteryx support bulk insert for Mongo?
I'm using Alteryx 10.1 and MongoDB 3.4
The answer at this moment is no, Alteryx doesn't support bulk insert for Mongo.
Related
I understand that MongoDB has the ability to do bulkWrite / bulkExec per collection... But what I don't understand is how to do it across the whole database...
Currently I'm doing things in parallel via Promise.all([collectionA.op1,collectionB.op2...]) but this seems incredibly inefficient as it's doing a new network request to Mongo for each operation.
It seems that if I could bulk up all the instructions I have and just send them to Mongo to operate on it would be much more efficient.
Does MongoDB support this? If not, why wouldn't it?
I am trying out Apache Drill to execute a query on a mongo connection. Simple COUNT(1) queries are taking too long. On the order of 20 seconds per query. When I try to connect using any other mongo connector and run the same query it takes miliseconds. I have also seen people talking online about their mongo queries taking 2 seconds. I can live with 2 seconds but 20 is too much.
Here is the query:
select count(*) from mongo.test.contacts
Here is the Query Profile for the query.
It seems that some optimizations should be applied for your case. It will be very helpful if you will create a Jira ticket [1] with details:
DDL for MongoDB table, version of MongoDB and info from log files (because it is not clear what Drill did all this time).
Simple reproduce of your case can help to solve this issue more quickly.
Thanks.
[1] https://issues.apache.org/jira/projects/DRILL/issues/
I'm using mongo and I have multiple queries to insert at a time so I use a for loop to insert into the db. The problem is that each query falls under a key so I check if a key exists of not and if it doesn't I add it to the db, if it does, I append it. If I have multiple queries with the same key (since mongo inserts asynchronously) these two same keys could be identified as "nonexistent" in the db since they could be running in parallel. Is there a way around this?
If you're writing a lot of documents you're probably better off using bulk operations in mongo https://docs.mongodb.com/manual/core/bulk-write-operations/.
You can write the queries as upserts. this questions is very similar I think to what you are trying to accomplish. How to properly do a Bulk upsert/update in MongoDB.
If you do it as an ordered bulk operation you should not have the problem with two queries running simultaneously.
What is the performance gain by using bulk inserts vs regular inserts in MongoDB and pymongo specifically. Are bulk inserts just a wrapper for regular inserts?
Bulk inserts are no wrappers for regular inserts. A bulk insert operation contains many documents sent as a whole. It saves as many database round trips. It is much more performant since you don't have to send each document over the network separately.
#dsmilkov Thats an advantage as you dont have to open a connection every single time.
What is actually happening behind the scene with a big InsertBatch if
one is writing to a sharded cluster? Does MongoDb actually support
bulk insert or the InserBatch is actually inserting one at a time at
the server level? How does this work with sharding then? Does this
mean that a mongos will look at every item in the batch to figure out
what is the shard key of each item and then will route it to the right
server? This will break bulk insert if it exist and does not seem to
be efficient. What is the mechanics of InsertBatch for a sharding
solution? I am using version 2.0 and willing to upgrade if that makes any difference
Bulk inserts are an actual MongoDB feature and are (somewhat) more performant than seperate per-document inserts due to less roundtrips.
In a sharded environment if mongos receives a bulk insert it will figure out which part of the bulk has to be sent to which shard. There are no differences between 2.0 and 2.1 and it is the most efficient way to bulk insert data into a sharded database.
If you're curious to how exactly mongos works have a look at it's source code here :
https://github.com/mongodb/mongo/tree/master/src/mongo/s