Upstart problems with clone using CLONE_NEWPID and expect fork - upstart

I have a binary which looks something like the following:
int RealMain() {
... runs forever ...
}
int main() {
clone(RealMain, ..., CLONE_NEWPID /*clone flags*/, ...);
_exit(0);
}
The idea is to launch a process which launches the actual process via clone() and exits. The reason for this model is the "CLONE_NEWPID" flag. I need the app to run in a separate PID namespace. Therefore, the actual application process needs to be created via clone using the CLONE_NEWPID flag.
When I launch this binary from command line, everything works well. But when I launch it through Upstart, the clone() fails with errno set to EPERM and the new PID namespace is never created. I'm using "expect fork" in the Upstart config because of the clone() call above. That way I can associate liveness to the lifetime of the child process executing RealMain().
expect fork
respawn
I'm wondering if there is some bad interaction between the implementation of "expect fork" and the use of CLONE_NEWPID which creates a new PID namespace. I found the following source about "expect fork" and ptrace issues, but haven't found someone else reporting a problem with CLONE_NEWPID:
https://blueprints.launchpad.net/ubuntu/+spec/foundations-q-upstart-overcome-ptrace-limitations
Thanks

Related

Failure/timeout invoking Lambda locally with SAM

I'm trying to get a local env to run/debug Python Lambdas with VSCode (windows). I'm using a provided HelloWorld example to get the hang of this but I'm not being able to invoke.
Steps used to setup SAM and invoke the Lambda:
I have Docker installed and running
I have installed the SAM CLI
My AWS credentials are in place and working
I have no connectivity issues and I'm able to connect to AWS normally
I create the SAM application (HelloWorld) with all the files and resources, I didn't change anything.
I run "sam build" and it finishes sucessfully
I run "sam local invoke" and it fails with timeout. I increased the timeout to 10s, still times out. The HelloWorld Lambda code only prints and does nothing else, so I'm guessing the code isn't the problem, but something else relating to the container or the SAM env itself.
C:\xxxxxxx\lambda-python3.8>sam build Your template contains a
resource with logical ID "ServerlessRestApi", which is a reserved
logical ID in AWS SAM. It could result in unexpected behaviors and is not recommended.
Building codeuri:
C:\xxxxxxx\lambda-python3.8\hello_world runtime: python3.8 metadata:
{} architecture: x86_64 functions: ['HelloWorldFunction'] Running
PythonPipBuilder:ResolveDependencies Running
PythonPipBuilder:CopySource
Build Succeeded
Built Artifacts : .aws-sam\build Built Template :
.aws-sam\build\template.yaml
C:\xxxxxxx\lambda-python3.8>sam local invoke Invoking
app.lambda_handler (python3.8) Skip pulling image and use local one:
public.ecr.aws/sam/emulation-python3.8:rapid-1.51.0-x86_64.
Mounting C:\xxxxxxx\lambda-python3.8.aws-sam\build\HelloWorldFunction
as /var/task:ro,delegated inside runtime container Function
'HelloWorldFunction' timed out after 10 seconds
No response from invoke container for HelloWorldFunction
Any hints on what's missing here?
Thanks.
Mostly, a lambda function gets timed out because of some resource dependency. Are you using any external resource, maybe db connection or some REST API call ?
Please put more prints in lambda_handler(your function handler), before calling any resource, then you might know where exactly it is waiting. Also increase the timeout to 1 minute or more because most of the external resource call over HTTPS will have 30 secs timeouts.
The log suggests that either the container wasn't started, or SAM couldn't connect to it.
Sometimes the hostname resolution on Windows can be affected by hosts file or system settings.
Try running the invoke command as follows (this will make the container ports bind to all interfaces):
sam local invoke --container-host-interface 0.0.0.0
...additionally try setting the container-host parameter (set to localhost by default):
sam local invoke --container-host-interface 0.0.0.0 --container-host host.docker.internal
The next piece of puzzle is incorporating these settings into VSCODE. This can to be done in two places:
create samconfig.toml in the root dir of the project with the following contents. This will allow running sam local invoke from the terminal without having to add the command line argument:
version=0.1
[default.local_invoke.parameters]
container_host_interface = "0.0.0.0"
update launch configuration as follows to enable VSCode debugging:
...
"sam": {
"localArguments": ["--container-host-interface","0.0.0.0"]
}
...

Capistrano - mark deployment as failed

I'm using Capistrano 3.
I want to trigger a webhook to a external service, when my deployment fails.
It's a matter of calling a method I have already defined, let's say it's called mark_failed.
How I can ensure the method is always called when the deployment fails, for whatever reason, except aborting it via CTRL+C ?
Tried fiddling around with
rescue SystemExit, Interrupt and rescue StandardError
I have no clue where to put my method call the way it will be called reliably.
Any clues ?
I would suggest using at_exit.
at_exit do
mark_failed if $!
end
raise "Something is wrong!"

No error when stopping non existing service with chef

Im new to chef and trying to understand why this code does not return any error while if i do the same with 'start' i will get an error for such service does not exist.
service 'non-existing-service' do
action :stop
end
# chef-apply test.rb
Recipe: (chef-apply cookbook)::(chef-apply recipe)
* service[non-existing-service] action stop (up to date)
Don't know which plattform you are running on if you are running on Windows it should at least log
Chef::Log.debug "#{#new_resource} does not exist - nothing to do"
given that you have debug as log level.
You could argue this is the wrong behaviour, but if the service dose not exist it for sure isen't running.
Source code
https://github.com/chef/chef/blob/master/lib/chef/provider/service/windows.rb#L147
If you are getting one of the variants of the init.d provider, they default to getting the current status of a service by grepping the process table. Because Chef does its own idempotence checks internally before calling the provider's stop method, it would see there is no such process in the table and assume it was already stopped.

Celery Beat - Pyramid Mailer

So, I have some plain python code which works pefectly in a normal python shell:
from pyramid_mailer.mailer import Mailer
from pyramid_mailer.message import Message
from pyramid_mailer.message import Attachment
mailer = Mailer(
host="172.10.10.240",
port="25")
message = Message(
subject="Orders with invalid status",
sender='r#example.com'],
recipients=['luke#example.com'],
html="<p>Test</p>")
mailer.send_immediately(message)
But, If I create a celery beat task such as this:
from pyramid_celery import celery_app as app
from pyramid_mailer.mailer import Mailer
from pyramid_mailer.message import Message
from pyramid_mailer.message import Attachment
mailer = Mailer(
host="172.10.10.240",
port="25")
#app.task
def wronglines_celery():
message = Message(
subject="Orders with invalid status",
sender='r#example.com'],
recipients=['luke#example.com'],
html="<p>Test</p>")
mailer.send_immediately(message)
This second example does not generate an email, it runs perfectly fine and throws no error at all, even with the log level set to DEBUG.
Running celery beat with:
celery beat -A pyramid_celery.celery_app --ini development.ini
Using the pyramid_celery plug-in as referenced in the official documentation on the celery website. My development.ini file can be seen below (relevant parts):
[celery]
BROKER_URL = amqp://app_rmq:password#localhost:5672/myvhost
CELERY_IMPORTS = intranet.celery_tasks
# Check once a day for orders with wrong line status
[celerybeat:task1]
task = intranet.celery_tasks.wronglines_celery
type = crontab
schedule = {"hour": 16, "minute": 30}
[logger_celery]
level = DEBUG
handlers =
qualname = celery
# Begin logging configuration
[loggers]
keys = root, intranet, sqlalchemy, celery
EDIT:
If I launch celery (without beat) it works perfectly, e.g. if I launch with:
celery worker -A pyramid_celery.celery_app --ini development.ini
All tasks execute (over and over) but all emails send and nothing throws an error, it seems to be the introduction of beat which is causing issues.
Are you sure its not working? The way we've configured your crontab it says "Only run once a day at 4:30". So if you ran that until it hit 4:30 I would expect it to execute properly.
Can you change your schedule to be {} instead to have it run every minute as a basic test?
I've added a crontab example to the examples here:
https://github.com/sontek/pyramid_celery/blob/master/examples/scheduler_example/development.ini#L33-L36
If you can provide more code (maybe a sample repo or modification of the examples already in the repo) that shows it not working I can take a look and hopefully fix the bug.
So, after much googlig and frustrating debugging I found an old github issue. That claimed celery tasks were working only when launched with a worker, and not with beat. The user states
Beat does not execute tasks, it just sends the messages. You need both a beat instance and a worker instance!
So to launch the work and the beat instance with the same command, shown here:
celery worker --beat -A pyramid_celery.celery_app --ini development.ini
I will be sending a pull request today to fix the documentation with regards to the correct way to launch a worker and beat instance.
By default, Celery tasks silently fail on error output. It most likely throws an exception which you never seen.
To be sure what's going to fail, put pdb (ipdb) breakpoint in task code, start celery worker on the foreground and step through the code line-by-line.

Mqseries queuemanager Name error(Reason code 2058)

I am trying to connect to my local queue by using cpan mqseries lib through perl script, in solaris environment.When i am executing my script it is giving Reson code as 2058.which means Queuemanager name error.
I have done following thing to analysis this issue,but still getting the reson code 2058.
1)Stop and started the queue manager.
2)checked the queuemanager name in my code.
3)sucessfully put and get the message in my queue by using amqget and amqput command,but it not working with my script.
Could anybody please help me in this,what kind of environment i have to set or any configuration setting i am missing.
my $qm_name = "MQTEST";
my $compCode = MQCC_WARNING;
my $Reason = MQRC_UNEXPECTED_ERROR;
my $Hconn = MQCONN($qm_name,
$compCode,
$Reason,
) || die "Unable to Connect to Queuemanager\n";
Maybe you are running to this issue?
"By default, the MQSeries module will try to dynamically determine
whether or not the localhost has any queue managers installed, and if
so, use the "server" API, otherwise, it will use the "client" API.
This will Do The Right Thing (tm) for most applications, unless you want to connect >directly to a remote queue manager from a host
which is running other queue managers locally. Since the existence of
locally installed queue managers will result in the use of the
"server" API, attempts to connect to the remote queue managers will
fail with a Reason Code of 2058."