How to check if symfony messenger is working - worker

I have a pod running in kubernetes / aws cloud. Due to limited configuration options in a custom deployment process (not my fault!!) I cannot start the symfony messenger as you usually would start it. What I have to do after a deployment is log into the shell and manually do
bin/console messenger:consume my_kafka_messages
Of course as soon as the pod for any reason is automatically restarted my worker isn't running. So until we can change the company deployment process I have to make sure to at least get notice if the worker isn't running.
Is there any option to e.g. run an symfony command which checks if the worker is running? If that was possible I could let the system start the worker or at least send me a notification.
How about
bin/console debug:messenger
?
If I do that and get e.g. this output is this sign that the worker is running? Or is it just the configuration of a worker, which could run, if it were started and may or may not run currently?
$ bin/console deb:mess
Messenger
=========
events
------
The following messages can be dispatched:
--------------------------------------------------
#codeCoverageIgnore
App\Domain\KafkaEvents\ProductPictureCollection
handled by App\Handler\ProductPictureHandler
--------------------------------------------------
Of course I can do a crude approach and check the db, which logs the processed datasets. But t is always possible that for e.g. 5 days there are no data to process. In that case I would get false alarms although everything is fine.
So checking directly if the worker is running would be much better, but I have no idea how to do it.

Related

How to run MirrorMaker 2.0 in production?

From the documentation, I see MirrorMaker 2.0 like this on the command line -
./bin/connect-mirror-maker.sh mm2.properties
In my case I can go to an EC2 instance and enter this command.
But what is the correct practice if this needs to run for many days. It is possible that EC2 instance can get terminated etc.
Therefore, I am trying to find out what are best practices if you need to run MirrorMaker 2.0 for a long period and want to make sure that it is able to stay up and running without any manual intervention.
You have many options, these include:
Add it as a service to systemd. Then you can specify that it should be started automatically and restarted on failure. systemd is very common now, but if you're not running systemd, there are many other process managers. https://superuser.com/a/687188/80826.
Running in a Docker container, where you can specify a restart policy.

.NET Core / Kubernetes - SIGTERM, clean shutdown

I'm trying to verify that shutdown is completing cleanly on Kubernetes, with a .NET Core 2.0 app.
I have an app which can run in two "modes" - one using ASP.NET Core and one as a kind of worker process. Both use Console and JSON-which-ends-up-in-Elasticsearch-via-Filebeat-sidecar-container logger output which indicate startup and shutdown progress.
Additionally, I have console output which writes directly to stdout when a SIGTERM or Ctrl-C is received and shutdown begins.
Locally, the app works flawlessly - I get the direct console output, then the logger output flowing to stdout on Ctrl+C (on Windows).
My experiment scenario:
App deployed to GCS k8s cluster (using helm, though I imagine that doesn't make a difference)
Using kubectl logs -f to stream logs from the specific container
Killing the pod from GCS cloud console site, or deleting the resources via helm delete
Dockerfile is FROM microsoft/dotnet:2.1-aspnetcore-runtime and has ENTRYPOINT ["dotnet", "MyAppHere.dll"], so not wrapped in a bash process or anything
Not specifying a terminationGracePeriodSeconds so guess it defaults to 30 sec
Observing output returned
Results:
The API pod log streaming showed just the immediate console output, "[SIGTERM] Stop signal received", not the other Console logger output about shutdown process
The worker pod log streaming showed a little more - the same console output and some Console logger output about shutdown process
The JSON logs didn't seem to pick any of the shutdown log output
My conclusions:
I don't know if Kubernetes is allowing the process to complete before terminating it, or just issuing SIGTERM then killing things very quick. I think it should be waiting, but then, why no complete console logger output?
I don't know if console output is cut off when stdout log streaming at some point before processes finally terminates?
I would guess that the JSON stuff doesn't come through to ES because filebeat running in the sidecar terminates even if there's outstanding stuff in files to send
I would like to know:
Can anyone advise on points 1,2 above?
Any ideas for a way to allow a little extra time or leeway for the sidecar to send stuff up, like a pod container termination order, delay on shutdown for that container, etc?
SIGTERM does indeed signal termination. The less obvious part is that when the SIGTERM handler returns, everything is considered finished.
The fix is to not return from the SIGTERM handler until the app has finished shutting down. For example, using a ManualResetEvent and Wait()ing it in the handler.
I've started to look into this for my own purposes and have come across your question over a year after it was posted... This is a bit late, but have you tried GraceTerm?
There is an associated NuGET package for this.
From the description...
Graceterm middleware provides implementation to ensure graceful shutdown of AspNet Core applications. The basic concept is: After application received a SIGTERM (a signal asking it to terminate), Graceterm will hold it alive till all pending requests are completed or a timeout occur.
I haven't personally tried this yet, but it does look promising.
Try add STOPSIGNAL SIGINT to your Dockerfile

AWS Fargate vs Batch vs ECS for a once a day batch process

I have a batch process, written in PHP and embedded in a Docker container. Basically, it loads data from several webservices, do some computation on data (during ~1h), and post computed data to an other webservice, then the container exit (with a return code of 0 if OK, 1 if failure somewhere on the process). During the process, some logs are written on STDOUT or STDERR. The batch must be triggered once a day.
I was wondering what is the best AWS service to use to schedule, execute, and monitor my batch process :
at the very begining, I used a EC2 machine with a crontab : no high-availibilty function here, so I decided to switch to a more PaaS approach.
then, I was using Elastic Beanstalk for Docker, with a non-functional Webserver (only to reply to the Healthcheck), and a Crontab inside the container to wake-up my batch command once a day. With autoscalling rule min=1 max=1, I have HA (if the container crash or if the VM crash, it is restarted by AWS)
but now, to be more efficient, I decided to move to some ECS service, and have an approach where I do not need to have EC2 instances awake 23/24 for nothing. So I tried Fargate.
with Fargate I defined my task (Fargate type, not the EC2 type), and configure everything on it.
I create a Cluster to run my task : I can run "by hand, one time" my task, so I know every settings are corrects.
Now, going deeper in Fargate, I want to have my task executed once a day.
It seems to work fine when I used the Scheduled Task feature of ECS : the container start on time, the process run, then the container stop. But CloudWatch is missing some metrics : CPUReservation and CPUUtilization are not reported. Also, there is no way to know if the batch quit with exit code 0 or 1 (all execution stopped with status "STOPPED"). So i Cant send a CloudWatch alarm if the container execution failed.
I use the "Services" feature of Fargate, but it cant handle a batch process, because the container is started every time it stops. This is normal, because the container do not have any daemon. There is no way to schedule a service. I want my container to be active only when it needs to work (once a day during at max 1h). But the missing metrics are correctly reported in CloudWatch.
Here are my questions : what are the best suitable AWS managed services to trigger a container once a day, let it run to do its task, and have reporting facility to track execution (CPU usage, batch duration), including alarm (SNS) when task failed ?
We had the same issue with identifying failed jobs. I propose you take a look into AWS Batch where logs for FAILED jobs are available in CloudWatch Logs; Take a look here.
One more thing you should consider is total cost of ownership of whatever solution you choose eventually. Fargate, in this regard, is quite expensive.
may be too late for your projects but still I thought it could benefit others.
Have you had a look at AWS Step Functions? It is possible to define a workflow and start tasks on ECS/Fargate (or jobs on EKS for that matter), wait for the results and raise alarms/send emails...

Airflow: what do `airflow webserver`, `airflow scheduler` and `airflow worker` exactly do?

I've been working with Airflow for a while now, which was set up by a colleague. Lately I run into several errors, which require me to more in dept know how to fix certain things within Airflow.
I do understand what the 3 processes are, I just don't understand the underlying things that happen when I run them. What exactly happens when I run one of the commands? Can I somewhere see afterwards that they are running? And if I run one of these commands, does this overwrite older webservers/schedulers/workers or add a new one?
Moreover, if I for example run airflow webserver, the screen shows some of the things that are happening. Can I simply get out of this by pressing CTRL + C? Because when I do this, it says things like Worker exiting and Shutting down: Master. Does this mean I'm shutting everything down? How else should I get out of the webserver screen then?
Each process does what they are built to do while they are running (webserver provides a UI, scheduler determines when things need to be run, and workers actually run the tasks).
I think your confusion is that you may be seeing them as commands that tell some sort of "Airflow service" to do something, but they are each standalone commands that start the processes to do stuff. ie. Starting from nothing, you run airflow scheduler: now you have a scheduler running. Run airflow webserver: now you have a webserver running. When you run airflow webserver, it is starting a python flask app. While that process is running, the webserver is running, if you kill command, is goes down.
All three have to be running for airflow as a whole to work (assuming you are using an executor that needs workers). You should only ever had one scheduler running, but if you were to run two processes of airflow webserver (ignoring port conflicts, you would then have two separate http servers running using the same metadata database. Workers are a little different in that you may want multiple worker processes running so you can execute more tasks concurrently. So if you create multiple airflow worker processes, you'll end up with multiple processes taking jobs from the queue, executing them, and updating the task instance with the status of the task.
When you run any of these commands you'll see the stdout and stderr output in console. If you are running them as a daemon or background process, you can check what processes are running on the server.
If you ctrl+c you are sending a signal to kill the process. Ideally for a production airflow cluster, you should have some supervisor monitoring the processes and ensuring that they are always running. Locally you can either run the commands in the foreground of separate shells, minimize them and just keep them running when you need them. Or run them in as a background daemon with the -D argument. ie airflow webserver -D.

wait for other deployments to start running before other can be created?

I am creating the deployments/services using REST APIs. I send POST request with bodies which contain the JSON objects which create the applications on Openshift. After I call all the APIs, these objects get instantiated.
I have 2 deployments which are dependent on mongodb deployment but this mongodb takes a little longer to start running, while the two deployments which are dependent on mongodb start running earlier. This breaks the code inside the 2 deployments as the mongodb connection fails(since it is not up yet).
There could be 2 possible way I can fix this problem.
I put a delay after i create mongodb deployment and recursively call the API to check it's status if it is running or not.
Just like we make changes in docker-compose, with the key, depends-on which tell the docker-compose that all the dependencies should be started first and then the dependent container.
Is there any way this could be achieved in openshift?
Instead of implementing complex logic for dependency handling, use health checking mechanism of Kubernetes. If your application starts and doesn't see Mongo DB, let it crash. Kubernetes will keep restarting it until Mongo DB comes online, and your application becomes healthy and serving as well. Kubernetes won't send traffic to not yet healthy instances.
Docs: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-probes/
Just like we make changes in docker-compose, with the key, depends-on which tell the docker-compose that all the dependencies should be started first and then the dependent container.
You might want to look into Init Containers for dependent container. They run to completion before container is actually started. Below excerpt is taken from referenced documentation (given below) for use cases that might be applicable to your issue:
They run to completion before any app Containers start, whereas app Containers run in parallel, so Init Containers provide an easy way to block or delay the startup of app Containers until some set of preconditions are met.
Examples
Here are some ideas for how to use Init Containers:
Wait for a service to be created with a shell command like:
for i in {1..100}; do sleep 1; if dig myservice; then exit 0; fi; done; exit 1
Register this Pod with a remote server from the downward API with a command like:
curl -X POST http://$MANAGEMENT_SERVICE_HOST:$MANAGEMENT_SERVICE_PORT/register -d ‘instance=$()&ip=$()’
Wait for some time before starting the app Container with a command like sleep 60.
Reference documentation:
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Alex has pointed out correct practice to follow with kubernetes. But if you still want directly depend on other pod phase you can use this pod-dependency-init-container that I have build. This will check if any pod with given labels is running before starting your pod.