Opensearch use Bulk to reload the document with same _id - opensearch

I plan to use the bulk service to load the documents into opensearch, and i am custom create the _id, but in some case the job failed or whatever, i need to redo the job, so this means for some document i need to load twice. can i use the _id to avoid duplication? and if this could help avoid duplicate, is opensearch using versioning to save both version or just remove the old one?
Thanks

Related

Parallel update issue in MongoDB

We have one field which gets updated on user action, admin action, and in cron at the same time then what should we do in order to handle this kind of scenario in MongoDB.
e.g There is a "balance" field in the user's collection when the cron is running user's balance is decreased, now at the same time if the user is recharging and admin is refunding then the balance is not getting updated.
So please suggest any solution for this problem.
If possible, use update operations. They are atomic at the document level, so this should not be a problem.
If you are using a recent version of mongodb, you can use transactions for read-update-writes.
If you cannot do any of these, you can emulate an optimistic locking scheme using versioning to prevent unintended overwrites. There are several ways this can be done, but it generally goes like this:
Read the document. Document has a version field (which can be an integer, or a unique ObjectId. Don't use timestamp)
Make modifications in memory and update the version (increment the integer, or generate a new ObjectId)
Update the document with query containing (version: oldVersion)
This will fail if someone updated the document after you read it but before you updated it. If it fails, retry.

Firestore full collection update for schema change

I am attempting to figure out a solid strategy for handling schema changes in Firestore. My thinking is that schema changes would often require reading and then writing to every document in a collection (or possibly documents in a different collection).
Here are my concerns:
I don't know how large the collection will be in the future. Will I hit any limitations on how many documents can be read in a single query?
My current plan is to run the schema change script from Cloud Build. Is it possible this will timeout?
What is the most efficient way to do the actual update? (e.g. read document, write update to document, repeat...)
Should I be using batched writes?
Also, feel free to tell me if you think this is the complete wrong approach to implementing schema changes, and suggest a better solution.
I don't know how large the collection will be in the future. Will I hit any limitations on how many documents can be read in a single query?
If the number of documents gets too large to handle in a single query, you can start paginating the results.
My current plan is to run the schema change script from Cloud Build. Is it possible this will timeout?
That's impossible to say at this moment.
What is the most efficient way to do the actual update? (e.g. read document, write update to document, repeat...)
If you need the existing contents of a document to determine its new contents, then you'll indeed need to read it. If you don't need the existing contents, all you need is the path, and you can consider using the Node.js API to only retrieve the document IDs.
Should I be using batched writes?
Batched writes have no performance advantages. In fact, they're often slower than sending the individual update calls in parallel from your code.

Is it possible to populate without schema

I have an application that uses mongo's oplog to follow insert, update and delete operations. I would like to get to full document through mongoose (populated), everytime one of the operations occur (findById with the id I get from the oplog), but I do not have the Schemas as there are defined in another application.
Do you think it is possible to get the full document without cloning the other application and registering each schemas for each model?
Thanks in advance!

Is there a risk of saving document in Mongo with _id from other DB?

I want to save documents to designated Mongo collection from other 3rd party API that uses Mongo too. I want to keep those id's so I would be able to check if I'm not saving duplicates.
Is there any risk that those ids may collide one day?
Is it possible to have isolated ObjectID generator for a specific collection?
(a) The likelihood is very low, but I will advise against it.
(b) Yes, it is. I can think of modifying it in the pr-save hook of your schema definition. There might also be modules out there for this.

Zend service solr update document

In Zend_Service_Solr I can add or delete a record.
$solr->addDocument($document);
Is there any way that I can update a record. I couldn't find any document for that. Or is there any extension for doing the same.
In most cases updating a document in Solr is to add the same document again (with the same value for the uniqueKey field).
It's possible to perform certain updates in more recent versions of Solr, but these require all fields to be stored (so that the document can just be re-added internally) and a custom update syntax. There are also some work in progress with non-textual DocValues being updatable without having to resubmit the complete document, but this is currently not in any released version of Solr.
The best way to handle this is usually to just re-submit the document with updated values, and have a straight forward way of doing that in your application code.