How to handle db failure at start and runtime - mongodb

The premises:
I'm using NodeJS, Mongodb Native 2.0+. I have created one connection (pool) which is keept open and reused in all modules. When production ready, the app and the db will be hosted on the same server, probably a VPS.
Questions about MongoDB in a production ready environment.
Should I anticipate that Mongod can crash? And if so, should I have some kind of autostart for it. Can "bugs" in my code be responsible for Mongod crashing?
If i should anticipate crasches, this probably also means that i should anticipate disruptions in my app when db cant be reached. What is the proper way to handle these? How long should i expect these disruptions to be?
If I start my app, manually close the Mongod and initiate a route call, the default response seems to be "waiting/loading" until Mongod is up again. I guess this default behaviour is ok if I dont expect disruption to be more than some seconds?
If Mongod is not up when starting the app, an exception will be thrown and the app will not start. This seems fine because without the db, the app cant do anything. Or should this be handled in another way?
I have made extensive search for this online, but have not found anything useful. Maybe I dont know what search terms to use...
There is alot of questions crammed up in one post here, but I hope someone could give me some answer or provide me with some links to good reading. The big question here is, how do i handle db failure at start and runtime

Related

Google Cloud SQL Migration Job stuck on Running

I've got a database on Google SQL that is used by our application running on kubernetes in GKE.
The mysql instance is running on 5.6, and I need to update it to 5.7, so I tried using the new migration jobs.
I've set up the connection profile and all the required permissions for the source DB, then followed the instructions to make a continuous migration.
The Job says it's running, migrating the ~450GB database. After about a day, it's still running, the storage used seems to have stopped growing, and the replication delay is at 0. The source database is not currently in use (That's why I'm unsing it to try this out before doing the same with a more important db).
According to this, if the dump phase is done, I should be able to promote the instance, but the promote button remains greyed out, and there's no way to check the running state (it only says "running", and I don't see any way to check if it's dumping, on CDC, or anything else).
The documentation seems a bit lacking, and I couldn't find anything by googling around. Has anyone been using this?
In short, my questions are:
Why can't I promote the instance?
and how can I check in what phase is the migration?
Here's a screencap of my job:
link because SO doesn't let me embed images yet
Thanks.
p.d.: the tag that the documentation says should be used in stackoverflow is: google-cloud-database-migration-service, which is too long and stackoverflow doesn't allow, so I used google-cloud-sql instead :/
I am seeing an issue like this, but possibly more frustrating. After a week for a 2TB database, storage resets to near-zero and the full dump restarts, without any errors or indication of what happened.

Why is my mongodb collection deleted automatically?

I have a MongoDB client in three EC2 instances and I have created a replica set. Last time I had a problem, of space constraint which stopped my mongod process, thereby halting the application and now in an instance couple of days back, some of my tables were gone from database, so I set logging and all to my database just to catch if anything like that happens again. In a fresh incident this morning I was unable to login to my system and that's when I found out that whole database was empty. I checked other SO question like this which suggest setting up a TTL.Which I haven't done at all.
Now how do I debug this situation and do a proper root cause analysis? I can't even find anything in my debug logs as well. The tables just vanished. How do I set up proper logging mechanism and how do I ensure that all my tables are never ever deleted again?
Today I got a mail from Amazon that I was probably running an unsecured version of MongoDB and that may have caused this issue. So who ever is facing this issue please go through the Security Checklist Provided by MongoDB. There are some points that are absolutely necessary in there.
1. Enable Access Control and Enforce Authentication
2. Encrypt Communication
3. Limit Network Exposure
These three are the core and depending upon how many people access your database you can Configure Role-Based Access Control.
These are all the things I have done. Before this incident I had not taken security that seriously but after I was hit by it. I made sure I have all the necessary precautions in place.
Hope this helps someone.

BIRT sessions stay open in Vertica

I'm having enormous problems managing connections in Vertica when developing BIRT reports. The basic idea is that sessions never die, so I always hit the connection cap. This is, of course, a problem, because then you can't use the database at all unless you do a close_all_sessions() to nuke everyone.
This happens at just about every level of development there is. First, in Esproc, when you develop the underlying logic... if there's a bug in your program before the connection.close(), the connection stays open and Esproc opens up a new one next execution. This adds up REALLY quickly when you have a couple of users developing stuff on the network.
Next, in Eclipse it's the same thing. You open a report and Eclipse creates a dozen connections that'll stay as long as you keep Eclipse open. Then, when you run the report, it'll create another bunch of connection, totally ignoring the ones it already has... and if you have bugs in your report, the dozen extras won't close.
Then on our website, same thing... problem running the report, boom, connections won't close EVER. I've had sessions stay open for two weeks with absolutely no activity. They only disappeared when I restarted Tomcat.
I'm at my whit's end here. There doesn't seem to be ANY way to set a session timeout in Vertica and I don't even know where to even begin looking to solve these problems. Everywhere I could find, the connection timeout was set to 20 seconds... so I would expect a connection to disappear after reaching that time, but of course that's not the case.
I really have no idea what to do here... and I'm desperate for some help here. Can anyone give me a clue? I've been at this for two days now and my brain just can't take anymore.
You want to use a connection pool instead of a direct JDBC access, it will blow away connection issues on Tomcat and improve performances.
Visit this article for more informations.
Define the connection pool (CP) in [Tomcat home]/conf/server.xml
Link the CP to web applications in [Tomcat home]/conf/context.xml
Install Apache Probe or something similar on Tomcat, this will help to test if the CP is correctly defined.
in BIRT reports, use JNDI URL property to link a datasource to the CP
This will solve the problem for the website, but not for Eclipse designer though. Try to upgrade to the most recent birt & jdbc versions.

How should I determine what is issuing a flush_all command

We have a memcached server that is shared by about two dozen apps. One of the web apps (or perhaps one of our utility apps) is issuing a flush_all command periodically. The frequency seems random, or at least we haven't seen a pattern yet. It happens about 10 times an hour.
Here's the rub. I can't figure out a good way to figure out which app is doing this. The memcacehd logs are not helpful at all. Here's what I've done so far:
* grep all source code - Other than memcached libraries I can't see anywhere where we issue this command.
* Enable verbose logging (-vv) in memcached - I see the commands get issued, but the log doesn't show any information about where the command is being issued from.
* Research how to administratively disable this; without an unapproved source patch to memcached I can't figure out a good way to do it.
Has anyone else had this problem? I'm assuming that this is coming from one of our web apps, but its possible its from somewhere else too. Any suggestions?
My next step is to setup a second memcached server and move applications one by one (which will be slow and time consuming). There must be a better way.
A little late, but in case anyone else hits this...
I'd suggest you set up multiple memcache proxies and configure each application to use a different one. The first proxy I found was twemproxy, no idea how good it is.
After that you can use the logs for the proxy to identify which application is issuing the commands.

How Do I Optimize Zend Framework

I have a application built on Zend Framework I am trying to optimize.
I did some Xdebug profiling and although i cant say i understand every nitty gritty of the results i got, some things were quite obvious from the result.
For instance, the file Bootstrap.php seems to be the one gulping most of the time taking 4,553MS seconds which accounts for 92.49% of the total time.
And if i dig further, I could see that Zend_Application_Bootstrap_Boostrap->run takes the bulk of the time. Checking this out again, I found out that Zend_Controller_Front->Dispatch might actually be the function inside the Boostrap.php that takes time to execute.
Question is, from these indices that i have, how best can I go about Optimizing the application? If it caching, how do i go about applying Caching to this situation?
Thanks
From the look of the callgrinds, on the login page the app is spending most of it's time in curl_exec, which is to be expected if you're doing a remote login. But it is doing 10 separate curl_execs which seems excessive. I'm not familiar with the LinkedIn login auth, but is it possible your app is running the remote login code multiple times?
On the standard page request the app is spending most of its time connecting to MySQL, and it seems to be doing this twice. Are you using a remote DB server, and do you need two separate DB connections?
Assuming you are using a remote DB server and it is on the same network as your web server, there seems to be some networking issue there. I'd check the latency to that server if you can, and try connecting to the IP address instead of a hostname to see if that makes any difference (if doing this is much faster this would suggest an issue with the DNS setup on your web server).