Matching documents of mongodb with json - mongodb

Is there any way to verify that all documents of a mongodb are correctly entered i.e. to check that the data in JSON file and the inserted data are same?
If yes how to do it? Consider that there are 3 millions documents in db.
I want to do it with java script.
Thanks

You will have to run a find for every document that you expect to be in the database, verifying that there is in fact an exact match present (just use the entire document as the match criteria).
In the future, you can use safe mode (safe=True in most drivers, but the syntax varies slightly) to make sure your writes do not fail. Using safe mode will alert you as to the results of the write.

Related

Firestore write behavior, partial document write

I wanted to get a better understanding on how writes to Firestore take place in case a client's internet connection might be intermittent.
I know there's batch writes for instances when you want to update multiple docs for an all or nothing write.
What I wanted to know is if I'm writing to a single document that has let's say 30 fields. Is Firestore write an all or nothing write per document or could there be a case where only part of the 30 fields update whereas the others fail due to internet issues?
Thanks!
A document write is all or nothing. It's not possible for a document to end up partially written in any circumstance.

In MongoDB, is there a way to update many documents and get the documents that were modified in a single call?

I'm working with the Mongo Java Driver, but looking through Mongo's documentation, it doesn't look driver specific.
update(filter, update) can update multiple documents but returns a WriteResult which only provides flags/counts.
findOneAndUpdate(filter, update) returns the actual document that was modified, but it can only update one document at a time.
Is there no way to do this in one call? If not, the client would have to call find(filter), then update(filter, update), then find(...) with a new filter matching the IDs obtained in the initial find (since the update can potentially change document values that were in the initial filter).
Is there a better way?
I am unaware of any write commands that return a cursor, which is essentially what you are asking for, nor am I seeing anything relevant in driver source.

Import/Export of Mongo Collections, Preserving _id

I have a MEAN database application with a number of Mongo collections with hierarchical relationships via ObjectId. A copy of the application works locally offline, and another copy runs on the production server.
The data contain collectively describe rules and content that drive a complex process. These data need to be entered offline so that these processes can be tested before the data go into the production environment.
What I assumed I would be able to easily do is to export selected documents as JSON, then relatively simply import them into the production database. So, the system would have a big "Export" button that would take the current document and all subdocuments and related documents, and export them as a single JSON file. Then, my "Import" button would parse that JSON file on the production server.
So, exporting is no problem. Did that in a couple of hours.
But, I quickly found that when I import a document, its _id field value is not preserved. This breaks relationships, obviously.
I have considered writing parsing routines that preserved these relationships by programmatically setting ObjectIds in parent documents after the child documents have been saved. This will be a huge headache though.
I'm hoping there is either:
a) ... and easy way to import a JSON document with _id fields intact, or ...
b) ... another way to accomplish this entirely that is easier than I am making it.
I appreciate any advice.
There's always got to be someone that doesn't know the answer who complains about the question. The question is clear and the problem is familiar.
Indeed, Mongoose will overwrite any value you provide for _id when you create a document either via the create() method or using the constructor (var thing = new Thing()).
Also, mongoexport/mongoimport will not fill the need to do this programmatically, at least not easily.
If I'm understanding correctly, you want to export a subset of documents, along with any related documents, keeping references intact. Then, you want to import this data into a remote system, again, keeping references intact.
The approach you took would work just fine except it will destroy all references, as you found out.
I've worked on a similar problem and I believe that the best way to do this is to do what it sounds like you wanted to avoid. That is, you'll iterate over your collections and let Mongo generate its _ids as it will. Add your child documents first, then set the references correctly in your parent documents. I really don't think there is a better way that still gives you granular control.
In current version of mongodb you can use db.copyDatabase(). Start current instance of mongodb where you want to copy database and run following command:
db.copyDatabase(fromDB, toDB).
For more options and details refer to db.copyDatabase()

change collection data formats within Meteor

Looking for the best way to fix data formats in my Meteor app. When I started, I wasn't using anything like SimpleSchema or being as consistent as I should have been with Date formats.
So now I'd like to get everything back to proper Date objects.
I'm still new-ish to Mongo, and I was a little surprised to find- and please correct me if I'm wrong- that there's no way to update all records and modify an attribute using its current value. I've got timestamps that came from an API POST that might be Strings, epoch times from new Date().getTime(), some actual Dates, etc.
I plan to use moment(currentValue).toDate() to fix this. I'm using percolate:migrations for data changes 1) so that changes stay in my repo and 2) so data is consistent wherever the app is run. I've looked at this question and I assume I'll need to iterate over my collections. But snapshot() isn't available in Meteor.
Do I need to write and manually run a mongo script for this?
Generally I prefer to run migration scripts from the mongo shell since it's easier to execute (compared to deploying the code that runs the migration) and it gives you access to the full mongo api. You can run load(path/to/script) in the mongo console if you want to pre define your script.
snapshot() ensures you wont modify the same document twice. From MongoDB docs
Append the snapshot() method to a cursor to toggle the “snapshot” mode. This ensures that the query will not return a document multiple times, even if intervening write operations result in a move of the document due to the growth in document size.
Running without snapshot() would possibly result in passing a date object (that was just converted) to your update function. Since you are planning to cover this case already (you are saying you already have some date objects in your db) it doesnt change much. Ergo, you can run this from meteor without snapshot() but you might as well use the shell to get used to it :)
And you are correct that there is no way to update a document based on its current value. Looping through all documents and updating them one by one is rather slow, so if you have a huge collection you might want to schedule some downtime.

How to store query output in temp db?

I am really new to the programming but I am studying it. I have one problem which I don't know how to solve.
I have collection of docs in mongoDB and I'm using Elasticsearch to query the fields. The problem is I want to store the output of search back in mongoDB but in different DB. I know that I have to create temporary DB which has to be updated with every search result. But how to do this? Or give me documentation to read so I could learn it. I will really appreciate your help!
Mongo does not natively support "temp" collections.
A typical thing to do here is to not actually write the entire results output to another DB since that would be utterly pointless since Elasticsearch does its own caching as such you don't need any layer over the top.
As well, due to IO concerns it is normally a bad idea to write say a result set of 10k records to Mongo or another DB.
There is a feature request for what you talk of: https://jira.mongodb.org/browse/SERVER-3215 but no planning as of yet.
Example
You could have a table of results.
Within this table you would have a doc that looks like:
{keywords: ['bok', 'mongodb']}
Each time you search and scroll through each result item you would write a row to this table populating the keywords field with keywords from that search result. This would be per search result per search result list per search. It would probably be best to just stream each search result to MongoDB as they come in. I have never programmed Python (though I wish to learn) so an example in pseudo:
var elastic_results = [{'elasticresult'}];
foreach(elastic_results as result){
//split down the phrases in this result and make a keywords array
db.results_collection.insert(array_formed_from_splitting_down_result); // Lets just lazy insert no need for batch or trying to shrink the amount of data to one go or whatever, lets just stream it in.
}
So as you go along your results you basically just mass insert as fast a possible create a sort of "stream" of input to MongoDB. It can do this quite well.
This should then give you a shardable list of words and language verbs to process things like MRs on and stuff to aggregate statistics about them.
Without knowing more and more about your scenario this is pretty much my best answer.
This does not use the temp table concept but instead makes your data permanent which is fine by the sounds of it since you wish to use Mongo as a storage engine for further tasks.
Actually there is MongoDB river plugin to work with Elasticsearch...
db.your_table.find().forEach(function(doc) { b.another_table.insert(doc); } );