Capistrano 3: Run task only on a single server from a pool of servers assigned a role - capistrano

I have 20 servers that are in the "web" role. I have a task I need to perform on only one of them as the change affects shared storage. My current solution is a hack to get around this (below). Looking for a better way, I don't have a ton of ruby or cap experience.
task :checkout_project_properties do
num_runs = 0
on roles(:web), in: :sequence do
if num_runs > 0
abort('Only running on one server. Exiting')
end
execute("checkout-project-properties #{uc_stage} #{repo} #{branch}")
num_runs += 1
end
end

I assume that you are referring to your production configuration, with so many web servers. In this case, your config/deploy/production.rb probably contains many lines like this:
server 'web_1', roles: %w(web)
server 'web_2', roles: %w(web)
server 'web_3', roles: %w(web)
...
Simply make one of these servers primary, so it looks like:
server 'web_1', roles: %w(web), primary: true
server 'web_2', roles: %w(web)
server 'web_3', roles: %w(web)
...
Then change your task so it looks like this:
task :checkout_project_properties do
on primary(:web) do
execute("checkout-project-properties #{uc_stage} #{repo} #{branch}")
end
end

Related

How to connect searchkick (in a Rails app &/ Sidekiq job) to multiple elasticsearch clusters without stomping on global searckick config?

Upon startup my app sets my (?global?) searchkick client to point at my default elasticsearch cluster.
Searchkick.client = Elasticsearch::Client.new(
hosts: default_cluster, # this is the list of hosts in my default cluster
retry_on_failure: true,
)
However, I am upgrading my cluster (again), and while I'd like to be able to have my app read/search from that default cluster,
/search?q="some term"
# =>
Model.search("some term")
continue to work against the default_cluster
Where it starts to get a bit tricky is that:
I'd also like (via some specific ?sidekiq background jobs?) to fill an alternate (alt) cluster's index, something like:
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.reindex
}
Without causing all other background jobs to interact with the alternate cluster.
And, of course:
I'd like some way to verify that the alternate_cluster is working well (i.e. for search) before making it my default_cluster. And presumably via some admin route:
/admin/search?q="some search term"&cluster=alternate
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
And finally:
I'd like to avoid having to reconnect before every search/reindex action, i.e. I'd prefer not to have the overhead of changing (also because that probably implies that long-running tasks that continue to reconnect to searchkick will be swapping back and-forth from one cluster to the other):
Model.search("some term")
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
^ I don't want that
FWIW, the best I've been able to come-up with so far is something like:
def self.connect_to(current_cluster, &block)
previous_es_client = Searchkick.client
current_es_client = Elasticsearch::Client.new(
hosts: current_cluster,
retry_on_failure: true,
)
block.call(current_es_client)
rescue Exception => e
logger.warn(e)
ensure
Searchkick.client = previous_es_client
end
But, I suspect that will cause every other interaction within my system (via the same web-worker or other background jobs running in the same background-worker-instance) to (temporarily) point at the alternate cluster.
Thanks in advance for your assistance...

How to stop a Postgrex process and its TypeServer process?

I have a few thousand databases. I want to connect to each of them in series one by one and issue a query. I do this by starting a Postgrex process like this for each one.
{:ok, pid} =
Postgrex.start_link(
port: database.port,
hostname: database.host,
username: database.username,
password: database.password,
database: database.database_name
)
I then issue a Postgrex.query, and then stop it like this
:ok = GenServer.stop(pid, :normal)
Everything seems to work fine, except that I end up with thousands of Postgrex.TypeServer processes eating up memory that don't seem to get cleaned up for quite a while.
Is there a better way to clean up a Postgrex process so that the TypeServer is also stopped?
I'm on Postgrex 0.13.3.
EDIT:
To clarify things a bit, I'd like to clean up the TypeServer after each Postgrex process is stopped. Cleaning up all the TypeServers after the entire Enum.map is done is not all that useful to me because it results in memory slowly growing followed by a sharp drop rather than a flat line.
Enum.map(databases, fn database ->
{:ok, pid} = Postgrex.start_link(port: database.port, hostname: database.host, username: database.username, password: database.password, database: database.database_name)
Postgrex.query!(pid, "some query", [])
:ok = GenServer.stop(pid, :normal)
# something here to clean up the TypeServer
end)
I did not dig it till the very bottom, but Postgrex.TypeServers are managed by dynamic Postgrex.TypeSupervisor, that is luckily started with a hard-coded name.
So my wild guess would be the following should do:
DynamicSupervisor.stop(Postgrex.TypeSupervisor)
Also, shutting down Postgrex.App should also help
Application.stop(:postgrex)

Ansible Copy Module Fails

I am trying to copy over the "resolve.conf" file from one machine to another and overwrite the old one. This operation works on all but 4 of the 40+ servers... I get an error it could not replace the file because it is not permitted. I have pasted the contents of the Playbook related to the failure of the operation below.
- hosts: all
remote_user: root
...
- name: Copy over the updated DNS configuration file
copy: src=/etc/resolv.conf dest=/etc/resolv.conf
It gives me the following error message for all 4 servers.
fatal: [server-name]: FAILED! => {"changed": false, "checksum": "9925f1a81f849f373f860c3156d19edcd1c002f2", "failed": true, "msg": "Could not replace file: /root/.ansible/tmp/ansible-tmp-1469481567.72-275811900408782/source to /etc/resolv.conf: [Errno 1] Operation not permitted"}
I just don't understand what the problem could be since I am accessing the machines as the root user and the Playbook succeeds on the majority of the servers - many with the exact same configuration and settings. For example, it succeeds on the server "server-analytical1" but fails on the server "server-analytical2". So, does anyone have any insight into why the Playbook would fail for only a few servers even though they're similar to or the same as other servers that succeeded?
Is the immutable bit set on the target file? Try lsattr /etc/resolv.conf and chattr -i /etc/resolv.conf to unset if it is.

cannot attach to service manager-error

I am new in firebird and I would like to trace my firebird-database activities, hence I am trying to use Audit/Trace Services.
My firbird databse is on Server: 10.7.105.8
I am running this comman in my cmd:
C:\Program Files\Firebird\Firebird_2_5\bin>fbtracemgr -se 10.7.105.8:3050:service_mgr -user SYSDBA -password masterkey -start -name "User Trace 1" -config "fbtrace.conf" > C:\Users\Babak\Desktop\trace.out
but I get this error:
Can not attach to service manager
Service 3050 : Service_mgr is not defined
What should I do to solve this problem?
thank you so much
EDIT
thank you for your hints. I think my trace process works fine, but I cant find the information, what I need in my trace.out file
If I am starting my trace my command promp looks like this:
if in this step I take a look in my trace.out I can only see this:
Trace Session ID 3 Started
I am running some select queries in my firebird, and then I finish my trace with with ctr+c, then the only things, which I can see in my trace.out are something like this:
Trace session ID 3 started
2015-07-08 10:49:59.868874 ***** loading fbclient.dll proc=4116 64Bit DLL Preload
2015-07-08 10:49:59.869066 GetDllDirectoryA=""
2015-07-08 10:49:59.869075 GetModuleFileNameA="C:\Program Files\Firebird\Firebird_2_5\bin\fbclient.dll"
2015-07-08 10:49:59.869086 Log-Level is set to 0
2015-07-08 10:49:59.869096 fbclient.dll loaded by: C:\Program Files\Firebird\Firebird_2_5\bin\fbtracemgr.exe
2015-07-08 10:49:59.869113 ***** dimensio integration successfully fbclient.dll
2015-07-08 10:58:10.091330 ***** cleanup unload fbclientorg.dll proc=4116
and not more infos about queries, which I have run.
Could you please say me, what I have done wrong? or what should I do more?
As Mark says, check file "fbtrace.conf". This is a text file and you will see something like this:
# default database section
#
<database>
# Do we trace database events or not
enabled false
# Operations log file name. For use by system audit trace only
#log_filename
....
....
# Put transaction start/end records
log_transactions false <--- TO TEST, SET THIS TO TRUE
# Put sql statement prepare records
log_statement_prepare false <-- TO TEST, SET THIS TO TRUE
Set to true what you need to trace, save the file and check the result.
Firebird connection strings are of the format:
host/port:database
Where /port is optional and defaults to 3050, and database is either the alias or path of a database, or the name of a service. Replace :3050 with /3050 (or leave it off entirely).
The following worked for me:
Open start menu
Search for services and open it
Search Firebird Guardian in the services list.
Start Firebird Guardian if it is stopped or restart if it is running.
Now try to connect your server. It will work.

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

I'm trying to setup an application webserver using uWSGI + Nginx, which runs a Flask application using SQLAlchemy to communicate to a Postgres database.
When I make requests to the webserver, every other response will be a 500 error.
The error is:
Traceback (most recent call last):
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
result = models.Event.query.get(id)
uwsgi is being managed by supervisor, which has a config:
[program:my_app]
command=/usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini --catch-exceptions
directory=/path/to/my/app
stopsignal=QUIT
autostart=true
autorestart=true
and uwsgi's config looks like:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
The furthest that I can get is that it has something to do with uwsgi's forking. But beyond that I'm not clear on what needs to be done.
The issue ended up being uwsgi's forking.
When working with multiple processes with a master process, uwsgi initializes the application in the master process and then copies the application over to each worker process. The problem is if you open a database connection when initializing your application, you then have multiple processes sharing the same connection, which causes the error above.
The solution is to set the lazy configuration option for uwsgi, which forces a complete loading of the application in each process:
lazy
Set lazy mode (load apps in workers instead of master).
This option may have memory usage implications as Copy-on-Write semantics can not be used. When lazy is enabled, only workers will be reloaded by uWSGI’s reload signals; the master will remain alive. As such, uWSGI configuration changes are not picked up on reload by the master.
There's also a lazy-apps option:
lazy-apps
Load apps in each worker instead of the master.
This option may have memory usage implications as Copy-on-Write semantics can not be used. Unlike lazy, this only affects the way applications are loaded, not master’s behavior on reload.
This uwsgi configuration ended up working for me:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
# the fix
lazy = true
lazy-apps = true
As an alternative you might dispose the engine. This is how I solved the problem.
Such issues may happen if there is a query during the creation of the app, that is, in the module that creates the app itself. If that states, the engine allocates a pool of connections and then uwsgi forks.
By invoking 'engine.dispose()', the connection pool itself is closed and new connections will come up as soon as someone starts making queries again. So if you do that at the end of the module where you create your app, new connections will be created after the UWSGI fork.
I am running a flask app using gunicorn on Heroku. My application started exhibiting this problem when I added the --preload option to my Procfile. When I removed that option, my application resumed functioning as normal.
Not sure whether to add this as an answer to this question or ask a separate question and put this as an answer there. I was getting this exact same error for reasons that are slightly different from the people who have posted and answered. In my setup, I using gunicorn as a wsgi for a Flask application. In this application, I was offloading some intense database operations off to a celery worker. The error would come from the celery worker.
From reading a lot of the answers here and looking at the psycopg2 as well as sqlalchemy session documentation, it became apparent to me that it is a bad idea to share an SQLAlchemy session between separate processes (the gunicorn worker and the sqlalchemy worker in my case).
What ended up solving this for me was creating a new session in the celery worker function so it used a new session each time it was called and also destroying the session after every web request so flask used a session per request. The overall solution looked like this:
Flask_app.py
#app.teardown_appcontext
def shutdown_session(exception=None):
session.close()
celery_func.py
#celery_app.task(bind=True, throws=(IntegrityError))
def access_db(self,entity_dict, tablename):
with Session() as session:
try:
session.add(ORM_obj)
session.commit()
except IntegrityError as e:
session.rollback()
print('primary key violated')
raise e