We are facing stale issue stale read issue for some percentage of users for our MongoDB Spring framework based app. It's a very low volume app with hits less than 10K a day as well as a record count of less than 100K or even less. Following is our app tech stack.
Mongo DB version db version v3.2.8.
compile group: 'org.springframework.data', name: 'spring-data-mongodb', version:'1.5.5.RELEASE'
compile group: 'org.mongodb', name: 'mongo-java-driver', version:'2.13.2'.
Users reported that in case of a new record insert or update, that value is not available to read for a certain duration say half an hour. After which the latest values in reading got reflected and available for reading across all users. However, when connecting with the mongo terminal, we are able to see the latest values in DB.
We confirmed that there is no application-level cache involved in reported flows. Also for JSP's we added timestamp on reported pages as well tried private browsing mode to rule out any browser issue.
We also tried changing Write concern in MongoClient and Mongo Template but no change in behavior:
MongoClientOptions.builder().writeConcern(WriteConcern.FSYNCED).build(); //Mongo Client
mongoTemplate.setWriteConcern(WriteConcern.FSYNCED); // Spring Mongo template
mongoTemplate.setWriteResultChecking(WriteResultChecking.LOG);
Also, DB logs look clean, no exceptions or errors seem to be generated on MongoDB logs.
We also didn't introduce any new library or DB changes and this setup was working prefect for the past 2 years. Any pointers would be helpful.
NOTE: It's a single mongo Instance with no slaves or secondary configured.
Write concern does not affect reads.
Most likely you have some cache in your application or on your user's system (like their web browser) that you are overlooking.
Second likely reason is you are reading from secondaries (i.e. using anything other than primary read preference).
Related
I have Postgresql as my primary database and I would like to take advantage of the Elasticsearch as a search engine for my SpringBoot application.
Problem: The queries are quite complex and with millions of rows in each table, most of the search queries are timing out.
Partial solution: I utilized the materialized views concept in the Postgresql and have a job running that refreshes them every X minutes. But on systems with huge amounts of data and with other database transactions (especially writes) in progress, the views tend to take long times to refresh (about 10 minutes to refresh 5 views). I realized that the current views are at it's capacity and I cannot add more.
That's when I started exploring other options just for the search and landed on Elasticsearch and it works great with the amount of data I have. As a POC, I used the Logstash's Jdbc input plugin but then it doesn't support the DELETE operation (bummer).
From here the soft delete is the option which I cannot take because:
A) Almost all the tables in the postgresql DB are updated every few minutes and some of them have constraints on the "name" key which in this case will stay until a clean-up job runs.
B) Many tables in my Postgresql Db are referenced with CASCADE DELETE and it's not possible for me to update 220 table's Schema and JPA queries to check for the soft delete boolean.
The same question mentioned in the link above also provides PgSync that syncs the postgresql with elasticsearch periodically. However, I cannot go with that either since it has LGPL license which is forbidden in our organization.
I'm starting to wonder if anyone else encountered this strange limitation of elasticsearch and RDMS.
I'm open to other options rather than elasticsearch to solve my need. I just don't know what's the right stack to use. Any help here is much appreciated!
I have a node.js app using Mongoose (v5.11.17) drivers. Before I used standalone db, but recently for reliability concerns the app was migrated to replicas set (of 3). There is a code that saves some object a in the db, and then could be any number of calls reading object a. Since migrating to the replica set some times reading can return as not found, that has never happened with the standalone db. My guess is the read preferences are not right somehow and the read goes to a non-primary db (or may be the write went to non-primary). Can I find out from the connection in mongo shell and query its read preference?
I'm sure I can write an easy script that simply drops the entire collection from the database but that seems very clumsy as a long term solution.
Currently, we don't have delete endpoints that actually DELETE, we have PUT endpoints that mark the entry as "DONT SHOW/REMOVED" and another "undelete endpoint" that restores the viewing since we technically don't want to delete any data in our implementation of this medical database, for liability purposes.
Does Jmeter have a way where I can make it talk to Mongo and delete? I know there is a deprecated way to talk to mongo via Jmeter but not sure about any modern solutions.
Since I can't add unused code into the repo, does this mean the only solution is for me to make a "extra endpoint" outside of the repo that Jmeter can access to delete each entry?
Seems like a viable solution just not sure if that's the only way to go about it and if I'm missing something.
MongoDB Test Elements were deprecated due to low interest as keeping the MongoDB driver which is being shipped with JMeter up-to-date would require extra effort and the number of users of the MongoDB Test Elements was not that high.
Mailing List Message
Associated JMeter issue
However given you don't test MongoDB per se and plan to use JMeter MongoDB elements only for setup/teardown actions I believe you can go ahead.
You can get MongoDB test elements back by adding the next line to user.properties file:
not_in_menu
This will "unhide" MongoDB Source Config and MongoDB Script elements which you will be able to use for cleaning up the DB. See How to Load Test MongoDB with JMeter for more information, sample queries, tips and tricks.
MongoDB contains data ready for client-side apps. The raw data being stored in Google BigQuery (GBQ). Each day a lot of new data being added into GBQ and once a day pretty much everything in MongoDB needs to be updated according to the most recent data in GBQ. All outdated (not updated) records must be deleted.
What is the right way to handle MongoDB update with close to 0 downtime?
Among the crazy solutions: may be i should have two instances of MongoDB, one is in production, another is being updated. Once the second db updated, i'll run Google Kubernetes Engine deploy with changed configs, so all clients will be smoothly moved from previous data to the updated one without messing up with partially updated data and without downtime. Though, i have never heard about such solutions, so i'm not sure if this is the right one.
Another solution is to have two versions of each collection under a single instance of MongoDB. Once collection is updated, server switches to that collection.
The 2nd solution seems a good option, if you know the trigger for the update, you can have minimum downtime by creating a new collection (named by date or a unique serial maybe) and update your code accordingly.
I had some good experience doing this for a fashion website sometime back, where we scraped data (using scrapinghub) and imported them into mongodb (collections stored by date) and used accordingly. So our scraping ran early morning (5-6AM) and when our editors/curators came in the office, they would start using the current dated collection (via the Web Interface of course :) )
I'm trying to set up an app that will act as a front end to an externally updated mongo database. The data will be pushed into the database by another process.
I so far have the app connecting to the external mongo instance and pulling data out with on issues, but its not reactive (not seeing any of the new data going into the mongo database).
I've done some digging and it so far can only find that I might need to set up replica sets and use oplog, is there a way to do this without going to replica sets (or is that the best way anyway)?
The code so far is really simple, a single collection, a single publication (pulling out the last 10 records from the database) and a single template just displaying that data.
No deps that I've written (not sure if that's what I'm missing).
Thanks.
Any reason not to use Oplog? For what I've read it is the recommended approach even if your DB isn't updated by an external process, and a must if it does.
Nevertheless, without Oplog your app should see the changes on the DB made by the external process anyway. It should take longer (up to 10 seconds), but it should update.