I am working on processing the mongodb oplog and I create a collection in mongodb to add the processed data and I don't want this collection to again generate oplog.
I want all other collection to generate oplog but need to exclude one of the collection. How can I achieve this. Is there any settings to let mongodb know not to generate oplog for a collection.
Any help is appreciated.
Collection in local database are not part of replication. So, If you create a collection in the local database and insert records to that, oplog entries are not created.
Related
I want to convert the MongoDB local Oplog file into an actual real query so I can execute that query and get the exact copy database.
Is there any package, file, build-in tools, or script for it?
It's not possible to get the exact query from the oplog entry because MongoDB doesn't save the query.
The oplog has an entry for each atomic modification performed. Multi-inserts/updates/deletes performed on the mongo instance using a single query are converted to multiple entries and written to the oplog collection. For example, if we insert 10,000 documents using Bulk.insert(), 10,000 new entries will be created in the oplog collection. Now the same can also be done by firing 10,000 Collection.insertOne() queries. The oplog entries would look identical! There is no way to tell which one actually happened.
Sorry, but that is impossible.
The reason is that, that opLog doesn't have queries. OpLog includes only changes (add, update, delete) to data, and it's there for replication and redo.
To get an exact copy of DB, it's called "replication", and that is of course supported by the system.
To "replicate" changes to f.ex. one DB or collection, you can use https://www.mongodb.com/docs/manual/changeStreams/.
You can get the query from the Oplogs. Oplog defines multiple op types, for instance op: "i","u", "d" etc, are for insert, update, delete. For these types, check the "o"/"o2" fields which have corresponding data and filters.
Now based on the op types call the corresponding driver APIs db.collection.insert()/update()/delete().
I am working on an app where I am looking into creating MongoDB collections on the fly as they are needed.
The app consumes data from a data source and maps the data to a collection. If the collection does not exist, the app:
creates the collection
kicks off appropriate indexes in the background
shards the collection
and then inserts the data into the collection.
While this is going on, other processes will be reading and writing from the database.
Looking at this MongodDB locking FAQ, it appears that the reads and writes in other collections of the database should not be affected by the dynamic collection creation snd setup i.e. they won't end up waiting on a lock we acquired to create the collection.
Question: Is the above assumption correct?
Thank you in advance!
No, when you insert into a collection which does not exist, then a new collection is created automatically. Of course, this new collection does not have any index (apart from _id) and is not sharded.
So, you must ensure the collection is created before the application starts any inserts.
However, it is no problem to create indexes and enable sharding on a collection which contains already some data.
Note, when you enable sharding on an empty collection or a collection with low amount of data, then all data is written initially to the primary shard. You may use sh.splitAt() to pre-split the upcoming data.
I consider using mongo db triggers for some data maintenance.
I will need to:
rename collection
create new collection
copy index from one collection to another
Is it possible using triggers?
I couldn't find it yet in the documentation.
We are doing migration from a collection(old collection) to another mongodb collection(new collection) in Azure cosmos. So while this migration we haven't created index in new collection and migrated all documents to new collection. So can we create index now in new collection or it should have created earlier before migration(before loading documents).Both(old & new collection) are sharded collection. Size of new collection is 211 gb, if I create index it will account for certain memory consumption(index size). So I would like to know is there any impact if we create index after loading documents? Are we good to create index after loading documents?
Yes, you can create indexes after the migration as well without any issues.
Please check this documentation for the commands to create various index types.
The commands to track progress of the indexing operation are also provided on the same document.
I have 2 collections of similar documents(i.e. same object, different values). One collection(X) is unsharded in database A, another collection(Y) is sharded and inside database B. When I try copy collection X into database B, I got error saying that "Shared throughput collection should have a partition key". I also tried copying data using foreach insert, but it takes too long time.
So my question is, how can I append the data from collection X to collection Y in efficient way?
Mongodb version on CosmosDB is 3.4.6
You may perform aggregation and add as last stage $merge operator.
| $merge | $out |
| Can output to a sharded collection. | Cannot output to a sharded collection. |
| Input collection can also be sharded. | Input collection, however, can be sharded. |
https://docs.mongodb.com/manual/reference/operator/aggregation/merge/#comparison-with-out
So my question is, how can I append the data from collection X to
collection Y in efficient way?
The server tools mongodump and mongorestore can be used. You can export the source collection data into BSON dump files and import into the target collection. These processess are very quick, because the data in the database is already in BSON format.
Data can be exported from a non-sharded collection to a sharded collection using these tools. In this case, it is required that the source collection has the shard-key field (or fields) with values. Note the indexes from the source collection are also exported and imported (using these tools).
Here is an example of the scenario in question:
mongodump --db=srcedb --collection=srcecoll --out="C:\mongo\dumps"
This creates a dump directory with the database name. There will be "srcecoll.bson" file in it and it is used for importing.
mongorestore --port 26xxxx --db=trgtdb --collection=trgtcoll --dir="C:\mongo\dumps\srcecoll.bson"
The host/port connects to the mongos of the sharded cluster. Note the bson file name need to be specified in the --dir option.
The import adds data and indexes into the existing sharded collection. The process only inserts data; the existing documents cannot be updated. If the _id value from the source collection already exists in the target collection, the process will not overwrite the documents (and those documents will not be imported, and it is not an error).
There are some useful options for mongorestore like: --noIndexRestore and --dryRun.
Because, the MongoDb version in CosmosDB currently 3.4.6, it doesn't support $merge and a lot of other commands such as colleciton.copyTo etc. Using Studio 3T's import feature didn't help as well.
The solution I use, is to download the target collection on my local mongodb, clean it then write java code that will read my clean data from local db and insertMany(or bulkwrite) it to the target collection. This way, the data will be appended to the target collection.
The speed I measured was 2 hours for 1m document count(~750MB), of course, this numbers might vary depending on various factors, i.e. network, document size etc.