Update all data in MongoDB or replace MongoDB instance

Update all data in MongoDB or replace MongoDB instance - mongodb

MongoDB contains data ready for client-side apps. The raw data being stored in Google BigQuery (GBQ). Each day a lot of new data being added into GBQ and once a day pretty much everything in MongoDB needs to be updated according to the most recent data in GBQ. All outdated (not updated) records must be deleted.
What is the right way to handle MongoDB update with close to 0 downtime?
Among the crazy solutions: may be i should have two instances of MongoDB, one is in production, another is being updated. Once the second db updated, i'll run Google Kubernetes Engine deploy with changed configs, so all clients will be smoothly moved from previous data to the updated one without messing up with partially updated data and without downtime. Though, i have never heard about such solutions, so i'm not sure if this is the right one.
Another solution is to have two versions of each collection under a single instance of MongoDB. Once collection is updated, server switches to that collection.

The 2nd solution seems a good option, if you know the trigger for the update, you can have minimum downtime by creating a new collection (named by date or a unique serial maybe) and update your code accordingly.
I had some good experience doing this for a fashion website sometime back, where we scraped data (using scrapinghub) and imported them into mongodb (collections stored by date) and used accordingly. So our scraping ran early morning (5-6AM) and when our editors/curators came in the office, they would start using the current dated collection (via the Web Interface of course :) )

Related

Stale read with Mongo and SpringData

We are facing stale issue stale read issue for some percentage of users for our MongoDB Spring framework based app. It's a very low volume app with hits less than 10K a day as well as a record count of less than 100K or even less. Following is our app tech stack.
Mongo DB version db version v3.2.8.
compile group: 'org.springframework.data', name: 'spring-data-mongodb', version:'1.5.5.RELEASE'
compile group: 'org.mongodb', name: 'mongo-java-driver', version:'2.13.2'.
Users reported that in case of a new record insert or update, that value is not available to read for a certain duration say half an hour. After which the latest values in reading got reflected and available for reading across all users. However, when connecting with the mongo terminal, we are able to see the latest values in DB.
We confirmed that there is no application-level cache involved in reported flows. Also for JSP's we added timestamp on reported pages as well tried private browsing mode to rule out any browser issue.
We also tried changing Write concern in MongoClient and Mongo Template but no change in behavior:
MongoClientOptions.builder().writeConcern(WriteConcern.FSYNCED).build(); //Mongo Client
mongoTemplate.setWriteConcern(WriteConcern.FSYNCED); // Spring Mongo template
mongoTemplate.setWriteResultChecking(WriteResultChecking.LOG);
Also, DB logs look clean, no exceptions or errors seem to be generated on MongoDB logs.
We also didn't introduce any new library or DB changes and this setup was working prefect for the past 2 years. Any pointers would be helpful.
NOTE: It's a single mongo Instance with no slaves or secondary configured.

Write concern does not affect reads.
Most likely you have some cache in your application or on your user's system (like their web browser) that you are overlooking.
Second likely reason is you are reading from secondaries (i.e. using anything other than primary read preference).

Insert MongoDB document with an objectId that existed in the past

I've a bunch of collections (12) and I need to rename many fields of them. I can't do it live in the database, all I can do is download and reupload a dump of it.
So I've downloaded the collection with mongodump manipulated the data and I'm planning to use mongorestore to push it back on the database.
I'm wondering what will happen then with ObjectIds.. I know that an objectId is unique throughout the database so I'm thinking about deleting all the old data right before using mongorestore, is it ok or will I still have problems with the ids?

You can specify any value for MongoID whatever you want. You even can use string instead of MongoID.
If you have production app you need to perform upgrade and migrate data by application itself step by step.
If you have one proccess singlethreaded application or if you can run your app in that way - it is most simple case. Else you need synchronization service.
Be carefull with async/await and promises and so on asyncronous processes. They receive and have in memory one the data in one time and continue process with that data in another time, and it need to have in mind that.
You need to do:
modify service to be ready to both data format
create modification code which will go through all the data and migrate it
modify service to be ready only to new data format once all the data migrate done

Meteor app as front end to externally updated mongo database

I'm trying to set up an app that will act as a front end to an externally updated mongo database. The data will be pushed into the database by another process.
I so far have the app connecting to the external mongo instance and pulling data out with on issues, but its not reactive (not seeing any of the new data going into the mongo database).
I've done some digging and it so far can only find that I might need to set up replica sets and use oplog, is there a way to do this without going to replica sets (or is that the best way anyway)?
The code so far is really simple, a single collection, a single publication (pulling out the last 10 records from the database) and a single template just displaying that data.
No deps that I've written (not sure if that's what I'm missing).
Thanks.

Any reason not to use Oplog? For what I've read it is the recommended approach even if your DB isn't updated by an external process, and a must if it does.
Nevertheless, without Oplog your app should see the changes on the DB made by the external process anyway. It should take longer (up to 10 seconds), but it should update.

How to handle syncing a user's db with a master db on a server?

So I'm planning an app that will involve having a master db on a server, lets say 3,000 CDs, with the columns Title, Artist, and Release Date.
1)When a user adds a CD to their collection, it will add it to the apps local SQLite DB. But lets say I spelled a CD title wrong, so I make an update to it. When the user goes to sync, how should I go about handling an updated row? Should I have a column 'IsUpdated' that is just a numeric value that increase by one every time I update that row? That way when the app sees IsUpdated on the server is larger than the local IsUpdated for that particular item, it will now to replace the contents. Does that make sense? Is it even practical? What other option would there be?
2) How would I do about handling the addition of brand new columns? Like adding a Barcode or Price? Do I just push an update for the app that adds the new columns locally, then do the same on the server, and let the rest take its run? Which would also trickle to number 1 with the syncing issue.

First you have to give more detail than that. Is the entire 3000 master list also replicated down to the remote db?
Sounds like it.
Ok so if that the case, this isn't a DB design issue so much as it is replication.
It's a bad idea to update every row in a table, especially one that makes the row longer. You'll be better off just dropping the table and recreating. <--- that's how it works in RDBMS on servers, no idea if that concept changes on a client db. And now we get into more iPhone questions of replication than simple db replication. Would it be better to just republish the app? Is the user data segregated from the server data. Can DDL be done on the local/remote tables after published?
Instead of searching the entire list for changes as you outline in #1. I would keep a dated delta table. The local app would store a last_updated_Datetime, any records in the delta table after that datetime would need to be brought down. Once downloaded the local system can determine how to apply them. Again this is inappropriate for mass changes.

SQlite synchronization scheme

I know it is xmas eve, so it is a perfect time to find hardcore programers online :).
I have a sqlite db fiel that contains over 10 K record, I generate the db from a mysql database, I have built the sqlite db within my iphone application the usual way.
The records contains information about products and their prices, shops and the like, this info of course is not static, I use an automatic scheme to populate and keep updating my mysql db.
Now, how can I update the iphoen app sqlite database with the new information available in the mysql db, the db structure is still the same, but the records contains new information.
Thanks.
Ahed
info:
libsqlite3.0,
iphone OS 3.1,
mysql 2005,
Mac OS X 10.6.2

There is a question you need to answer first; How do you determine the set of changed records in your MySQL database?
Or, more specifically, given that the MySQL database is in state A, some transactions occur and now it is in state B, how do you know what changed between A and B?
Bottom line; you need a schema in MySQL that enables this. Once you have answered that question, then you can answer the "how do I sync problem?".

I have a similar application.
I am using Push Notification to let my users know there is new or updated data available.
Each time a record on the server is updated, I store a sequential record-number alongside the record.
Each UDID that's registered has a "last updated" number associated with it that contains the highest record-number it has ever downloaded.
When any given device comes to get it's updates, all database records greater than the UDID's last updated record-number as stored on the server are sent to the device. If everything goes OK, the last updated record-number for the UDID is set to the last record number sent.
The user has the option to fetch all records and refresh his database if he feels any need to sync his device to the entire database.
Seems to be working well.
-t

You can find many other similar questions by searching for "iphone synchronization":
https://stackoverflow.com/search?q=iphone+synchronization
I'm going to assume that the data is going only from mysql to sqlite, and not the reverse direction.
Three are a few ways that I could imagine doing this. The first is to just redownload
the entire database during every update. Another way, which I'm describing below, would be to create a "log" table to record the modifications to your master table, and then download just the new logs when doing the update.
I would creat a new "log" table in your SQL database to log changes to the table needing synchronization. The log could contain a "revision" column to track in what order changes were made, a "type" column to specify if it was a insert, update, or or delete, a the row-id if your affected row, and finally have the entire set of columns from your master table.
You could automate the creation of log entries by using stored procedures as wrappers to modify your master table.
With only 10k records, I wouldn't expect this log table to grow to be that huge.
You would then in sqlite keep track of the latest revision downloaded from mysql. To update the table, you would download all log entries after the latest update, and then apply them to your sqlite table.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse