How to connect searchkick (in a Rails app &/ Sidekiq job) to multiple elasticsearch clusters without stomping on global searckick config? - searchkick

Upon startup my app sets my (?global?) searchkick client to point at my default elasticsearch cluster.
Searchkick.client = Elasticsearch::Client.new(
hosts: default_cluster, # this is the list of hosts in my default cluster
retry_on_failure: true,
)
However, I am upgrading my cluster (again), and while I'd like to be able to have my app read/search from that default cluster,
/search?q="some term"
# =>
Model.search("some term")
continue to work against the default_cluster
Where it starts to get a bit tricky is that:
I'd also like (via some specific ?sidekiq background jobs?) to fill an alternate (alt) cluster's index, something like:
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.reindex
}
Without causing all other background jobs to interact with the alternate cluster.
And, of course:
I'd like some way to verify that the alternate_cluster is working well (i.e. for search) before making it my default_cluster. And presumably via some admin route:
/admin/search?q="some search term"&cluster=alternate
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
And finally:
I'd like to avoid having to reconnect before every search/reindex action, i.e. I'd prefer not to have the overhead of changing (also because that probably implies that long-running tasks that continue to reconnect to searchkick will be swapping back and-forth from one cluster to the other):
Model.search("some term")
# =>
Model.connect_to(alternate_cluster) {|client|
Searchkick.client = client
Model.search("some term")
}
^ I don't want that
FWIW, the best I've been able to come-up with so far is something like:
def self.connect_to(current_cluster, &block)
previous_es_client = Searchkick.client
current_es_client = Elasticsearch::Client.new(
hosts: current_cluster,
retry_on_failure: true,
)
block.call(current_es_client)
rescue Exception => e
logger.warn(e)
ensure
Searchkick.client = previous_es_client
end
But, I suspect that will cause every other interaction within my system (via the same web-worker or other background jobs running in the same background-worker-instance) to (temporarily) point at the alternate cluster.
Thanks in advance for your assistance...

Related

Akka cluster sharding configuration

Ok, I have 2 instances of my backend, hosted on 2 difference centos servers. What I want to do using Akka Cluster Sharding is to divide the work done by each of these instances:
I have data for 4 countries, which is retrieved from db at every 10 seconds by both backend instances, which update a Redis instance. So, multiple times, I have duplicated requests, because both backends get data for same country;
Using Akka Cluster Sharding, I try to divide the work dinamically, instance1 to get data for ES and EN, instance2 to get data for DE and IT. In case of instance1 is down, instance2 will take the jobs and will get data even for ES/EN.
I tought this is simple...but not.
All jobs are done by Akka Actors, so using Cluster Sharding, I thought all declared actors (from both instances) will be centralized somewhere, to can manipulate which do whatever job.
On localhost, all works fine, because I have an instance for my app with port 9001 and 2 cluster nodes with ports 2551 and 2552. But for production, I can't understand how to configure the hostnames
application.conf
"clusterRegistration" {
akka {
actor {
allow-java-serialization = on
provider = cluster
}
remote.artery {
enabled = on
transport = aeron-udp
}
cluster {
jmx.multi-mbeans-in-same-jvm = on
seed-nodes = [
"akka://ClusterService#instance1:8083",
"akka://ClusterService#instance1:2551"
]
}
}
}
class
object ClusterSharding {
def createNode(hostname: String, port: Int, role: String, props: Props, actorName: String) = {
val config = ConfigFactory.parseString(
s"""
|akka.cluster.roles = ["$role"]
|akka.remote.artery.canonical.hostname = $hostname
|akka.remote.artery.canonical.port = $port
|""".stripMargin
).withFallback(ConfigFactory.load
.getConfig("clusterRegistration"))
val system = ActorSystem("ClusterService", config)
system.actorOf(props, actorName)
}
val master = createNode("instance1", 8083, "master", Props[Master], "master")
createNode("instance1", 2551, "worker", Props[Worker], "worker")
createNode("instance2", 8083, "worker", Props[Worker], "worker")
Future {
while (true) {
master ! Proceed // this will fire an Actor Resolver case
Thread.sleep(5000)
}
}
}
master actor
class Master extends Actor {
var workers: Map[Address, ActorRef] = Map()
val cluster = Cluster(context.system)
override def preStart(): Unit = {
cluster.subscribe(
self,
initialStateMode = InitialStateAsEvents,
classOf[MemberEvent],
classOf[UnreachableMember]
)
}
override def postStop(): Unit = {
cluster.unsubscribe(self)
}
def receive = handleClusterEvents // cluster events
.orElse(handleWorkerRegistration) // worker registered to cluster
.orElse(handleJob) // give jobs to workers
def handleJob: Receive = {
case Proceed => {
// Here I must be able to use all workers from both instances
// (centos1 and centos2) and give work for each dinamically
if (workers.length == 2) {
worker1 ! List("EN", "ES")
worker2 ! List("DE", "IT")
} else if (workers.length == 1) {
worker ! List("EN", "ES", "DE", "IT")
} else {
execQueries() // if no worker is available, each backend instance will exec queries on his own way
}
}
}
}
Both instances are hosted with port 8083 (centos1: instance1:8083, centos2: instance2:8083). If I use settings just for one of the instances in application.conf and in createNode (instance1 for example), I can see in logs that the workers are created, but there is no communication with the second instance.
Where I'm wrong? thx
Your approach to configuring the hostnames is viable. There are better ways to do it (depending on how you're deploying the service: manual deploy vs. ansible/chef/puppet vs. docker vs. kubernetes/nomad/mesos will be different), but setting the hostname isn't likely your actual problem.
Your current approach will give you a master and 2 workers on every node and you're not actually using Cluster Sharding (you're using Cluster, but Cluster Sharding is something you opt into on top of Cluster). From the code you've posted, I strongly suspect that using Cluster Sharding will entail a dramatic redesign (though without posting the Worker and more complete Master code, it's hard to say).
The broad approach I'd take with this would be to have the process of updating Redis for a given country be owned by a sharded entity (keyed by that country). A cluster singleton actor would trigger the update process for each country every 10 seconds. Because we're using sharding and singleton, I'd probably actually have at least 3 instances of the service, or alternatively make use of a strongly consistent external lease system (the other split-brain resolution strategies (note that cluster sharding and cluster singleton basically force you to resolve split-brains) will all boil down, at least half the time, to losing one node is the same as losing both in a 2-node cluster). Because sharding implies that the actor for a process could be stopped arbitrarily (and possibly resumed on a different node), you'll also want to think about how the process can be resumed in a way that makes sense for the application.
Starting multiple ActorSystems in the same JVM process is generally only a good idea in fairly specific circumstances.

How to send logs to Cloudwatch from a background process running in AWS Fargate containers?

I'm using Fargate. My container is running two processes. Celery worker in the background and Django in the foreground. The foreground process emits logs to stdout, hence AWS takes care of sending Django logs to concerned Cloudwatch Log Group and Stream.
Since its running in the background, how do send the celery worker's logs to (a different Log Stream within same Log Group) Cloudwatch?
If there's no way to move the second process to the separate container and log it as usual, you may install awslogs package to the container and set it up to read background process' log files and send content to CloudWatch.
But I'd not recommend such approach.
Again this is not necessarily a Fargate based issue or question. For logging in celery check this out http://docs.celeryproject.org/en/latest/userguide/tasks.html#logging.
The worker won’t update the redirection if you create a logger instance somewhere in your task or task module.
If you want to redirect sys.stdout and sys.stderr to a custom logger you have to enable this manually, for example:
import sys
logger = get_task_logger(__name__)
#app.task(bind=True)
def add(self, x, y):
old_outs = sys.stdout, sys.stderr
rlevel = self.app.conf.worker_redirect_stdouts_level
try:
self.app.log.redirect_stdouts_to_logger(logger, rlevel)
print('Adding {0} + {1}'.format(x, y))
return x + y
finally:
sys.stdout, sys.stderr = old_outs
And for logging with Fargate, i would use the awslogs driver. Below is how you configure as documented here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html
If using the console:
Or this if in cloudformation template:
Like #OK999 said, Celery is designed to swallow logs whether its on Fargate or not. We ended up using a Django LOGGING config like:
LOGGING = {
'version': 1,
# This only "disables" but the loggers don't propagate
# 'disable_existing_loggers': False,
...
'handlers': {
'console': {
'level': env.str('LOGGING_LEVEL'),
'class': 'logging.StreamHandler',
'formatter': 'verbose',
},
},
'loggers': {
...
# celery won't route logs to console without this
'celery': {
# filtered at the handler
'level': logging.DEBUG,
'handlers': ['console'],
},
...
We had to make this change long before transitioning to Fargate.

Buildbot slaves priority

Problem
I have set up a latent slave in buildbot to help avoid congestion.
I've set up my builds to run either in permanent slave or latent one. The idea is the latent slave is waken up only when needed but the result is that buildbot randomly selectes one slave or the other so sometimes I have to wait for the latent slave to wake even if the permanent one is idle.
Is there a way to prioritize buildbot slaves?
Attempted solutions
1. Custom nextSlave
Following #david-dean suggestion, I've created a nextSlave function as follows (updated to working version):
from twisted.python import log
import traceback
def slave_selector(builder, builders):
try:
host = None
support = None
for builder in builders:
if builder.slave.slavename == 'host-slave':
host = builder
elif builder.slave.slavename == 'support-slave':
support = builder
if host and support and len(support.slave.slave_status.runningBuilds) < len(host.slave.slave_status.runningBuilds):
log.msg('host-slave has many running builds, launching build in support-slave')
return support
if not support:
log.msg('no support slave found, launching build in host-slave')
elif not host:
log.msg('no host slave found, launching build in support-slave')
return support
else:
log.msg('launching build in host-slave')
return host
except Exception as e:
log.err(str(e))
log.err(traceback.format_exc())
log.msg('Selecting random slave')
return random.choice(buildslaves)
And then passed it to BuilderConfig.
The result is that I get this in twistd.log:
2014-04-28 11:01:45+0200 [-] added buildset 4329 to database
But the build never starts, in the web UI it always appear as Pending and none of the logs I've put appear in twistd.log
2. Trying to mimic default behavior
I've having a look to buildbot code, to see how it is done by default.
in file ./master/buildbot/process/buildrequestdistributor.py, class BasicBuildChooser you have:
self.nextSlave = self.bldr.config.nextSlave
if not self.nextSlave:
self.nextSlave = lambda _,slaves: random.choice(slaves) if slaves else None
So I've set exactly that lambda function in my BuilderConfig and I'm getting exactly the same build not starting result.
You can set up a nextSlave function to assign slaves to a builder in a custom manner see: http://docs.buildbot.net/current/manual/cfg-builders.html#builder-configuration

uWSGI, Flask, sqlalchemy, and postgres: SSL error: decryption failed or bad record mac

I'm trying to setup an application webserver using uWSGI + Nginx, which runs a Flask application using SQLAlchemy to communicate to a Postgres database.
When I make requests to the webserver, every other response will be a 500 error.
The error is:
Traceback (most recent call last):
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/base.py", line 867, in _execute_context
context)
File "/var/env/argos/lib/python3.3/site-packages/sqlalchemy/engine/default.py", line 388, in do_execute
cursor.execute(statement, parameters)
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
The above exception was the direct cause of the following exception:
sqlalchemy.exc.OperationalError: (OperationalError) SSL error: decryption failed or bad record mac
The error is triggered by a simple Flask-SQLAlchemy method:
result = models.Event.query.get(id)
uwsgi is being managed by supervisor, which has a config:
[program:my_app]
command=/usr/bin/uwsgi --ini /etc/uwsgi/apps-enabled/myapp.ini --catch-exceptions
directory=/path/to/my/app
stopsignal=QUIT
autostart=true
autorestart=true
and uwsgi's config looks like:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
The furthest that I can get is that it has something to do with uwsgi's forking. But beyond that I'm not clear on what needs to be done.
The issue ended up being uwsgi's forking.
When working with multiple processes with a master process, uwsgi initializes the application in the master process and then copies the application over to each worker process. The problem is if you open a database connection when initializing your application, you then have multiple processes sharing the same connection, which causes the error above.
The solution is to set the lazy configuration option for uwsgi, which forces a complete loading of the application in each process:
lazy
Set lazy mode (load apps in workers instead of master).
This option may have memory usage implications as Copy-on-Write semantics can not be used. When lazy is enabled, only workers will be reloaded by uWSGI’s reload signals; the master will remain alive. As such, uWSGI configuration changes are not picked up on reload by the master.
There's also a lazy-apps option:
lazy-apps
Load apps in each worker instead of the master.
This option may have memory usage implications as Copy-on-Write semantics can not be used. Unlike lazy, this only affects the way applications are loaded, not master’s behavior on reload.
This uwsgi configuration ended up working for me:
[uwsgi]
socket = /tmp/my_app.sock
logto = /var/log/my_app.log
plugins = python3
virtualenv = /path/to/my/venv
pythonpath = /path/to/my/app
wsgi-file = /path/to/my/app/application.py
callable = app
max-requests = 1000
chmod-socket = 666
chown-socket = www-data:www-data
master = true
processes = 2
no-orphans = true
log-date = true
uid = www-data
gid = www-data
# the fix
lazy = true
lazy-apps = true
As an alternative you might dispose the engine. This is how I solved the problem.
Such issues may happen if there is a query during the creation of the app, that is, in the module that creates the app itself. If that states, the engine allocates a pool of connections and then uwsgi forks.
By invoking 'engine.dispose()', the connection pool itself is closed and new connections will come up as soon as someone starts making queries again. So if you do that at the end of the module where you create your app, new connections will be created after the UWSGI fork.
I am running a flask app using gunicorn on Heroku. My application started exhibiting this problem when I added the --preload option to my Procfile. When I removed that option, my application resumed functioning as normal.
Not sure whether to add this as an answer to this question or ask a separate question and put this as an answer there. I was getting this exact same error for reasons that are slightly different from the people who have posted and answered. In my setup, I using gunicorn as a wsgi for a Flask application. In this application, I was offloading some intense database operations off to a celery worker. The error would come from the celery worker.
From reading a lot of the answers here and looking at the psycopg2 as well as sqlalchemy session documentation, it became apparent to me that it is a bad idea to share an SQLAlchemy session between separate processes (the gunicorn worker and the sqlalchemy worker in my case).
What ended up solving this for me was creating a new session in the celery worker function so it used a new session each time it was called and also destroying the session after every web request so flask used a session per request. The overall solution looked like this:
Flask_app.py
#app.teardown_appcontext
def shutdown_session(exception=None):
session.close()
celery_func.py
#celery_app.task(bind=True, throws=(IntegrityError))
def access_db(self,entity_dict, tablename):
with Session() as session:
try:
session.add(ORM_obj)
session.commit()
except IntegrityError as e:
session.rollback()
print('primary key violated')
raise e

How define twice the same service in Puppet?

In order to deploy Varnish with a Puppet class, I need stop Varnish for move and deploy files, then at the end, ensure that Varnish is started.
My problem is simple, how I can define twice a service in a Puppet class, in order to stop or start the service at differents steps ?
class varnish::install (
(...)
service { "varnish":
ensure => "stopped",
require => Package['varnish'],
before => Exec['mv-lib-varnish'],
}
(...)
service { "varnish":
ensure => "running",
require => File["$varnishncsa_file"],
}
}
I've an Duplicate definition: Service[varnish] (...) error, and it's logical...
What's the best practice to manage services in a Puppet class ? Divide in multiple classes, or there is an option for "rename" a service for declare it several times ?
try the following to get rid of duplicate error, but what you are trying to do is wrong.
Puppet brings system to certain consistent state - so telling stop service X, do some work , start service X - it out of scope of proper puppet use, puppet is more like restart service if some files on which the service depends were modified.
class varnish::install (
(...)
service { "varnish-stop":
name => "varnish"
ensure => "stopped",
require => Package['varnish'],
before => Exec['mv-lib-varnish'],
}
(...)
service { "varnish-start":
name => "varnish"
ensure => "running",
require => File["$varnishncsa_file"],
}
}
Use exec with service restart as a hook (notify) for "deploy files" action (package/another exec). Define service itself only once as running, because that is what you normally want assuring. Puppet is for describing target state.