Rack: Bundler::GemNotFound errors during `bundle install --deployment` - sinatra

So I have a few machines in production that are running a Sinatra app on top of Rack. Usually everything is hunky dory until Puppet - which we're using to sync changes to our servers - notices that the Gemfile.lock for the project has changed, and as a result, needs to issue the bundle install --binstubs --deployment command so we get the new gems. When this happens, ANY http request will cause a 500 error when it calls into Bundler to require our gems, because the new gems haven't been installed yet.
We usually have at least one Rack process hanging around due to another process that periodically makes an http request to ensure the server is alive, but when this happens, there are no Rack processes alive. It seems like the PassengerMinInstances directive might help if the problem were with new instances, but we also have a process that periodically fetches a page to test that the server is still up, so there still should be at least one Rack process alive to handle the request.
I should probably note that puppet doesn't actually restart Rack (by touching the restart.txt file) until after the bundle install has completed, so it doesn't make any sense why our Rack processes would go away at this time. Has anyone encountered anything like this? Is there some Rack option to not reload the entire environment on every request that I've overlooked?

I know this doesn't directly answer your question, but what I've done in the past to get around this kind of thing happening is to deploy the app into version-numbered dirs with a soft link pointing to them and an (Nginx) proxy server routing requests to the link. At the end of the deployment the deploy script points the link to the new app.
It seemed to work well enough for me, and if things really go wrong you can always manually repoint the link back to the previous version.

For posterity's sake, I'll answer this question. As part of the deployment, all of the files were touched with chown -R, which updates the ctime (but not the mtime) of the file. There is also an interesting bug/feature in Passenger where they will restart the server whenever the mtime or ctime of the /tmp/restart.txt file changes.
Solution: stop chowning the directory during a deployment.

Related

mongodb i/o timeout when using clustered mongo instances

I have an application that is using the upper.io/db package for communication with a Mongo database server (which is a fairly simple wrapper around gopkg.in/mgo.v2). The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.
This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return
read tcp <ip address>:47112: i/o timeout
This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.
The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:
sess := session.Clone()
defer sess.Close()
// make requests to Mongo
During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.
Has anybody run into this before? Do I need to configure upper.io/db in a specific fashion? Maybe use mgo directly? I am at my wits end with this :(
In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the upper.io/db library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.
I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.

starting warden after zookeeper of MapR

I am installing the MapR and I stucked at starting warden after start zookeeper on a single node.
# service mapr-warden start
Error: warden can not be started. See /opt/mapr/logs/warden.log for details
On this file there is no detail. Does anybody have a hint? Thanks =)
If you aren't getting anything in warden.log, then it's likely that the warden JVM is never even being started by the mapr-warden init script.
In some MapR versions, the mapr-warden init script will log some details into /opt/mapr/logs/wardeninit.log. You can try checking there.
However, I will also caution that currently the logging done by the init script is sparse and not necessarily user friendly to read. If you can't discern the cause from the contents of the wardeninit.log you can post them here and maybe I can help.
Another thing you can do is edit /etc/init.d/mapr-warden and add "set -x" towards the top of the file, right before the "BASEMAPR=" line, then try starting warden again and you'll get a bunch of shell debugging output on your screen. If you copy and paste that output here that should be enough to tell the root cause of the problem.
One more thing to mention, you may be better off using the http://answers.mapr.com forum as that is MapR specific and I think there may be more users there that could help.
Was configure.sh (/opt/mapr/server/configure.sh -C nodeA -Z nodeA)run on the node? Did zookeeper come up successfully?
service mapr-zookeeper status
Even when using MapR in a single node configure.sh is still required. In fact, without configure.sh warden, zookeeper, cldb and other MapR components will lack their configuration and in many cases will fail to start.
You must run configure.sh after installing the software packages (deb or rpm).

MongoDB replica set in Azure "Waiting for role to start... Calling OnRoleStart()"

I have a problem trying to implement a mongodb replica set as a worker role instance in Windows Azure. In the Windows Azure portal, one of the instances is shown as busy with the status:
Waiting for role to start... Calling OnRoleStart()
I have checked all the settings and everything seems to be ok, what could the problem be?
Denis Markelov's blog post helped me solve this problem. The solution is mainly his, however I had to take an extra step to get it to work and thought others might find it useful.
Solution from blog:
Windows Azure reuses virtual machines for roles, so after a fresh
deployment on a hard drive you can find files that were created during
previous sessions. If MongoDB was terminated improperly - there might
be a lock file ("persisted mutex" analogue), because of which MongoDB
refuses to start. It is located at the drive with a label
"WindowsAzureDrive" (say it is F:), at the path:
F:\data\mongod.lock
In the case of a production use this situation might require a
recovery procedures, but if you are just in the process of initial
setup - it is safe to remove this file, letting MongoDB to start
again.
I was having this problem and did as suggested, however I was still having the same problem. So I took a look at the log file at
C:\Resources\Directory\.MongoDB.WindowsAzure.MongoDBRole.MongodLogDir\mongod.txt
And saw that another file was also giving an error. In order to fix the problem, you also have to delete the file local.ns in the same directory as mongod.lock.

How can I deploy code 'on-demand' using chef client-server architecture?

Here's the scenario:
I can SSH into my Chef-Server . But I can't SSH into any of the Chef-Clients. So this is how I work : I have a workstation to change or create Roles . All the chef-clients are running as daemons , so when they wake up , they notice state changes and start updating themselves .
Now , I need to configure code deployments on these clients . I was thinking I could use application cookbook for that , and add recipes to the roles using my workstation . But won't that result in deployments every time the chef-clients wake up and find revision changes ? I want an On Demand kind of deployment : I want to deploy only when the code is deployment ready , not for any other commit till that point .
How do I achieve this ?
Couple of questions
When id your code deployment ready? How would you know? If it's a repeatable process could you not code that into a recipe? if it's not a repeatable process you need to make it one so that it can be automated
IE run cucumber tests and if they all pass then deploy else just do nothing?
We feed from Artifactory and use the web api to check the latest installer available to us. If it's the same as previously installed (done by checking/creating a registry key) we say to the user, this build is already installed so we're skipping. If it's not the same we install. Now I know this isn't the exact same scenario but it feels to me like some custom code is going to be needed here.
Either that or leverage databag values to say install=true or false depending on the state of the code. You would update project a's install item in the databag when you want to deploy and the rest of the time it's set to false. The recipe would only proceed if the value was true?
Why not have a branch where HEAD is always ready to be deployed? Only push to this branch when your code is ready to go out into the world. Then you don't have to worry about intermediate, unstable states of your repository being synced by chef. Of course, you still have to wait for a client to wake up and sync before you see your changes, so if latency is a problem this won't work.

Change site configuration without restarting G-WAN

I'm looking at hosting a number of small, static websites and have been looking at a few alternatives including G-WAN. At the moment I'm just trying to get a feel for how well each server suits my needs before picking one.
G-WAN seems to do exactly what I want, though I'm running into problems with updating the configuration (by adding new folders) after the server's started. I can't find anything in the documentation or online about this, so I don't know if I'm doing anything dumb, running an unsupported configuration, or whether it's a feature that doesn't exist in G-WAN.
Here's my setup:
G-WAN 3.3.28 64-bit on Ubuntu 12.04.1 LTS.
I have what I think is the required minimal folder structure:
0.0.0.0_80
#0.0.0.0
www
$site.com
www
$othersite.com
www
I startup gwan via (I'm still messing around, so hopefully ):
sudo .\gwan -d
Everything works brilliantly. I add $thirdsite.com/, $thirdsite.com/www/, and $thirdsite.com/www/index.html; then when I try to visit thirdsite.com it gives me the root host (ie it doesn't seem to pick up the changes).
To reload the modified configuration, I have to either do:
sudo .\gwan -k; sudo .\gwan -d
or kill the non-angel process (kill -s 15) to restart the child process.
Can G-WAN reload the host definitions another way? If so, is it something that works out of the box or is there a command that can cycle the server without dropping requests made to other hosts (/is it safe to kill -s 15 on the non-angel process + if so, is there a reliable way to identify the process)? Thanks in advance!
G-WAN loads the host definitions at startup and does not check them as time goes to reload them dynamically.
To force a reload, you have to stop the child process (when in daemon mode) and v3.9+ keeps the old child alive the time to process any pending request while the new child accepts new connections.
Since stopping the child can also be done from the maintenance script or from a handler or from a servlet by just running exit(0) there is not need for a dedicated command.
Note that when you use kill you can pick the pid file from the gwan directory:
the parent process starts with a capital letter: Gwan_xxxx.pid
the child process starts with a lowercase letter: gwan_xxxx.pid
That will make your life easier.