I came across the temporal function "temporal.documentDelete" which "logically deletes" temporal documents in a MarkLogic database hence removing it from the latest collection. But the document is still not physically deleted from MarkLogic database. You can still retrieve the deleted documents using its URI.
Is there any way, where I can as well physically delete the temporal documents ingested into my MarkLogic database?
You can use temporal.documentWipe, but bear in mind that it will wipe all versions of that document. You would basically be rewriting history, which is against the nature of temporal.
Also note that you can only wipe documents whose protection has expired. You protect temporal documents using temporal.documentProtect.
More notes on deleting and wiping temporal documents can be found in the Temporal Guide:
http://docs.marklogic.com/guide/temporal/managing#id_10558
HTH!
Related
I want to convert the MongoDB local Oplog file into an actual real query so I can execute that query and get the exact copy database.
Is there any package, file, build-in tools, or script for it?
It's not possible to get the exact query from the oplog entry because MongoDB doesn't save the query.
The oplog has an entry for each atomic modification performed. Multi-inserts/updates/deletes performed on the mongo instance using a single query are converted to multiple entries and written to the oplog collection. For example, if we insert 10,000 documents using Bulk.insert(), 10,000 new entries will be created in the oplog collection. Now the same can also be done by firing 10,000 Collection.insertOne() queries. The oplog entries would look identical! There is no way to tell which one actually happened.
Sorry, but that is impossible.
The reason is that, that opLog doesn't have queries. OpLog includes only changes (add, update, delete) to data, and it's there for replication and redo.
To get an exact copy of DB, it's called "replication", and that is of course supported by the system.
To "replicate" changes to f.ex. one DB or collection, you can use https://www.mongodb.com/docs/manual/changeStreams/.
You can get the query from the Oplogs. Oplog defines multiple op types, for instance op: "i","u", "d" etc, are for insert, update, delete. For these types, check the "o"/"o2" fields which have corresponding data and filters.
Now based on the op types call the corresponding driver APIs db.collection.insert()/update()/delete().
I will be running a nightly cron job to query a collection and then send results to another system.
I need to sync this collection between two systems.
Documents can be removed from the host and this deletion needs to be reflected on the client system.
So - my question is, is there a way to query for documents that have been recently deleted?
I'm looking for something like db.Collection.find({RECORDS_THAT_WERE_DELETED_YESTERDAY});
I was reading about parsing the oplog. However, I don't have one setup yet. Is that something you can introduce into an existing DB?
We have a collection of documents and each document has a time window associated to it. (For example, fields like 'fromDate' and 'toDate'). Once the document is expired (i.e. toDate is in the past), the document isn't accessed anymore by our clients.
So we wanted to purge these documents to reduce the number of documents in the collection and thus making our queries faster. However we later realized that this past data could be important to analyze the pattern of data changes, so we decided to archive it instead of purging it completely. So this is what I've come up with.
Let's say we have a "collectionA" which has past versions of documents
Query all the past documents in "collectionA". (queries are made on secondary server)
Insert them to a separate collection called "collectionA-archive"
Delete the documents from collectionA that are successfully inserted in the archive
Delete documents in "collectionA-archive" that meet a certain condition. (we do not want to keep a huge archive)
My question here is, even though I'm making the queries on Secondary server, since the insertions are happening in Primary, does the documents inserted in archive collection make it to the working set of Primary ? The last thing we need is these past documents getting stored in RAM of Primary which could affect the performance of our live API.
I know, one solution could be to insert the past documents into a separate DB server. But acquiring another server is a bit of hassle. So would like to know if this is achievable within one server.
In a MongoDB server, there may be multiple databases, and each database can have multiple collections, and a collection can have multiple documents.
Does a lock apply to a collection, a database, or a server?
I asked this question because when designing MongoDB database, I want to determine what is stored in a database and what is in a collection. My data can be partitioned into different parts, and I hope to be able to move a part from a MongoDB server to a filesystem, without being hindered by the lock that applies to another part, so I wish to store the parts of data in a way that different parts have different locks.
Thanks.
From the official documentation : https://docs.mongodb.com/manual/faq/concurrency/
Basically, it's global / database / collection.
But with some specific storage engines, it can lock at document level too, for instance with WiredTiger (only with Mongo 3.0+)
I am confused by how mongo renames collections and how much time will it take to rename a very large collection.
Here is the scenario, I have a mongo collection with too much data (588 million documents), which slows down finding and insertion, so I creating an archive collection to keep all this data.
For this I am thinking to rename the old collection to oldcollectionname_archive and start with a fresh collection with oldcollectionname.
And planning to do this by following command :
db.oldCollectionName.renameCollection("oldCollectionName_archive")
But I am not sure, how much time it will take.
I read the mongodocs and many stackoverflow answers regarding collection renaming, but I could find anywhere any data regarding whether the size of the collection affect the time required to renaming the collection.
Please help if anyone has any knowledge regarding this or any same experience.
Note : I have read other issues which can occur during renaming, on mongo documentation and other SO answers.
From the mongodb documentation (https://docs.mongodb.com/manual/reference/command/renameCollection/)
renameCollection has different performance implications depending on the target namespace.
If the target database is the same as the source database, renameCollection simply changes the namespace. This is a quick operation.
If the target database differs from the source database, renameCollection copies all documents from the source collection to the target collection. Depending on the size of the collection, this may take longer to complete. Other operations which require exclusive access to the affected databases will be blocked until the rename completes. See What locks are taken by some common client operations? for operations which require exclusive access to all databases.
Note that:
* renameCollection is not compatible with sharded collections.
* renameCollection fails if target is the name of an existing collection and you do not specify dropTarget: true.
I have renamed multiple collections with around 500M documents. It completes in ~0 time.
This is true for MongoDB 3.2 and 3.4, and I would guess also for older versions.