Mongo Bulk Insert across multiple collections - mongodb

I see that mongo has bulk insert, but I see nowhere the capability to do bulk inserts across multiple collections.
Since I do not see it anywhere I'm assuming its not available from Mongo.
Any specific reason for that?

You are correct in that the bulk API operates on single collections only.
There is no specific reason but the APIs in general are collection-scoped so a "cross-collection bulk insert" would be a design deviation.
You can of course set up multiple bulk API objects in a program, each on a different collection. Keep in mind that while this wouldn't be transactional (in the startTrans-commit-rollback sense), neither is bulk insert.

Related

Firestore full collection update for schema change

I am attempting to figure out a solid strategy for handling schema changes in Firestore. My thinking is that schema changes would often require reading and then writing to every document in a collection (or possibly documents in a different collection).
Here are my concerns:
I don't know how large the collection will be in the future. Will I hit any limitations on how many documents can be read in a single query?
My current plan is to run the schema change script from Cloud Build. Is it possible this will timeout?
What is the most efficient way to do the actual update? (e.g. read document, write update to document, repeat...)
Should I be using batched writes?
Also, feel free to tell me if you think this is the complete wrong approach to implementing schema changes, and suggest a better solution.
I don't know how large the collection will be in the future. Will I hit any limitations on how many documents can be read in a single query?
If the number of documents gets too large to handle in a single query, you can start paginating the results.
My current plan is to run the schema change script from Cloud Build. Is it possible this will timeout?
That's impossible to say at this moment.
What is the most efficient way to do the actual update? (e.g. read document, write update to document, repeat...)
If you need the existing contents of a document to determine its new contents, then you'll indeed need to read it. If you don't need the existing contents, all you need is the path, and you can consider using the Node.js API to only retrieve the document IDs.
Should I be using batched writes?
Batched writes have no performance advantages. In fact, they're often slower than sending the individual update calls in parallel from your code.

How GraphQL handle "Two Phase Commits" in MongoDB?

MongoDB has a "Two Phase Commits" concept.
Operations on a single document are always atomic with MongoDB databases; however, operations that involve multiple documents, which are often referred to as “multi-document transactions”, are not atomic. Since documents can be fairly complex and contain multiple “nested” documents, single-document atomicity provides the necessary support for many practical use cases.https://docs.mongodb.com/v3.4/tutorial/perform-two-phase-commits/
Since as Schema in GraphQL create separate document, How GraphQL handle this?
GraphQL doesn't have any intrinsic notion of transactions or atomicity. The only statement along these lines is that multiple top-level fields in a single mutation are resolved serially with the expectation that later mutations will see side effects from previous ones. If you have multiple changes in a single GraphQL call and a later one fails, GraphQL says absolutely nothing about whether a first one should be rolled back.
If you're implementing a GraphQL schema in a way that requires changing multiple documents or records, it's up to you as an implementer (or possibly an intermediate library you're using) to provide whatever atomicity and consistency guarantees you need. GraphQL doesn't provide anything here.

Denormalization vs Parent Referencing vs MapReduce

I have a highly normalized data model with me. Currently I'm using manual referencing by storing the _id and running sequential queries to fetch details from the deepest collection.
The referencing is one-way and the flow has around 5-6 collections. For one particular use case, I'm having to query down to the deepest collection by querying subsequent "_id" from the higher level collections. So technically I'm hitting the database every time I run a
db.collection_name.find(_id: ****).
My prime goal is to optimize the read without hugely affecting the atomicity of the other collections. I have read about de-normalization and it does not make sense to me because I want to keep an option for changing the cardinality down the line and hence want to maintain a separate collection altogether.
I was initially thinking of using MapReduce to do an aggregation from the back and have a collection primarily for the particular use-case. But well even that does not sound that good.
In a relational db, I would be breaking the query in sub-queries and performing a join to get the data sets that intersect from the initial results. Since mongodb does not support joins, I'm having a tough time figuring anything out.
Please help if you have faced anything like this earlier or have any idea how to resolve it.
Denormalize your data.
MongoDB does not do JOIN's - period.
There is no operation on the database which gets data from more than one collection. Not find(), not aggregate() and not MapReduce. When you need to puzzle your data together from more than one collection, there is no other way than doing it on the application layer. For that reason you should organize your data in a way that any common and performance-relevant query can be resolved by querying just a single collection.
In order to do that you might have to create redundancies and transitive dependencies. This is normal in MongoDB.
When this feels "dirty" to you, then you should either accept the fact that your performance will be sub-optimal or use a different kind of database, like a classic relational database or a graph database.

MongoDB Chain queries, pseudo transactions

I understand you cannot do transactions in MongoDB and the thinking is that its not needed because everything locks the whole database or collection, I am not sure which. However how then do you perform the following?
How do I chain together multiple insert, update, delete or select queries in mongodb so that other queries that might operate on the same data wait until these queries finish? An analogy would be serialization transaction isolation in ms sql server.
more..
I want to insert/update record into collection A and update a record in collection B and then read Collection A and B but I don't want anyone (process or thread) to read or write to collection A or B until BOTH A and B have been updated or inserted by the first queries.
Yes, that's absolutely possible.
It is called ordered bulk operations on planet Mongo and works like this in the mongo shell:
bulk = db.emptyCollection.initializeOrderedBulkOp()
bulk.insert({name:"First document"})
bulk.find({name:"First document"})
.update({$set:{name:"First document, updated"}})
bulk.execute()
bulk.findOne()
> {_id: <someObjectId>, name:"First document, updated"}
Please read the manual regarding Bulk Write Operations for details.
Edit: Somehow is misread your question. It isn't possible for two collections. Remember though, that you can have different documents in one collection. Some ODMs even allow to have different models saved to the same collection. Exploiting this, you should be able to achieve what you want using the above bulk operations. You may want to combine this with locking to prevent writing. But preventing reading and writing would be the same as an transaction in terms of global and possibly distributed locks.

does mongodb have the properties such as trigger and procedure in a relational database?

as the title suggests, include out the map-reduce framework
if i want to trigger an event to run a consistency check or security operations before a record is inserted, how can i do that with MongoDB?
MongoDB does not support triggers, but people have created solutions around them, mostly using the oplog, though this will only help you if you are running with replica sets, as the oplog is a capped collection that keeps track of data changes for the purposes of replication.
For a nodejs solution see: https://www.npmjs.org/package/mongo-watch or see an earlier SO thread: How to listen for changes to a MongoDB collection?
If you are concerned with consistency, read about write concern in mongoDB. http://docs.mongodb.org/manual/core/write-concern/ You can be as relaxed or as strict as you want by setting insert write concern levels, from fire and hope to getting an acknowledgement from all members of the replica set.
So, if you want to run a consistency check before inserting data, you probably will have to move that logic to the client application and set your write concern level to a level that will ensure consistency.
MongoDb does not have triggers or stored procedures. While there are solutions that some have used to try to emulate the behavior, as it is not a built-in feature, you'll need to decide whether the solutions are effective for you. Searching for "triggers and mongodb" should find dozens. All depend on the oplog and replicas.
But, given the nature of MongoDb and a typical 3 tier architecture, I would expect that at the point of data insertion, which could be on a web server for example, you would run, on the web server, the necessary consistency and security checks. You wouldn't allow a client such as a mobile application to directly set data into the database collection without some checks.
Many drivers for MongoDb and extended libraries have validation and consistency checks built in already, so there is less to do. Using unique indexes for some fields can also provide a level of consistency that you cannot do from the driver alone. Look at calls like findAndModify which make atomic updates.