How to register Buildbot worker with master? - buildbot

I had to replace a Buildbot master server, and even though I seemingly reinstalled it with the identical code and settings as before, I must have missed something, because it's now rejecting all requests from all existing workers with errors in its twistd.log file like:
2018-08-22 21:17:28-0400 [Broker,678,10.229.39.202] invalid login from unknown user 'worker2'
2018-08-22 21:17:28-0400 [Broker,678,10.229.39.202] Peer will receive following PB traceback:
2018-08-22 21:17:28-0400 [Broker,678,10.229.39.202] Unhandled Error
Traceback (most recent call last):
Failure: twisted.cred.error.UnauthorizedLogin:
How do I re-register the workers with master? The docs don't mention this, nor where the worker username/passwords are stored. I tried re-running the buildbot-worker create-worker ... commands and then restarting Buildbot, but that had no effect.

Goog day.
In a worker name/pass stored in a buildbot.tac
Name/pass in a worker must be same as in master.cfg file at master.
For example, if master.cfg on master contain
c['workers'].append(worker.Worker('remote-worker', 'pass'))
a buildbot.tac at worker should contain
...
workername='remote-worker'
passwd='pass'
...
Please, note, that if you make a changes in a master.cfg you should upgrade master via command buildbot upgrade-master:
http://docs.buildbot.net/current/manual/installation/buildmaster.html
The upgrade-master command is idempotent. It is safe to run it
multiple times. After each upgrade of the buildbot code, you should
use upgrade-master on all your buildmasters.
P.S.
On 3rd September buildbot team presented new version 1.4 I have buildbot master 1.3 and one buildbot worker 1.4 and it work correctly

Related

Failure/timeout invoking Lambda locally with SAM

I'm trying to get a local env to run/debug Python Lambdas with VSCode (windows). I'm using a provided HelloWorld example to get the hang of this but I'm not being able to invoke.
Steps used to setup SAM and invoke the Lambda:
I have Docker installed and running
I have installed the SAM CLI
My AWS credentials are in place and working
I have no connectivity issues and I'm able to connect to AWS normally
I create the SAM application (HelloWorld) with all the files and resources, I didn't change anything.
I run "sam build" and it finishes sucessfully
I run "sam local invoke" and it fails with timeout. I increased the timeout to 10s, still times out. The HelloWorld Lambda code only prints and does nothing else, so I'm guessing the code isn't the problem, but something else relating to the container or the SAM env itself.
C:\xxxxxxx\lambda-python3.8>sam build Your template contains a
resource with logical ID "ServerlessRestApi", which is a reserved
logical ID in AWS SAM. It could result in unexpected behaviors and is not recommended.
Building codeuri:
C:\xxxxxxx\lambda-python3.8\hello_world runtime: python3.8 metadata:
{} architecture: x86_64 functions: ['HelloWorldFunction'] Running
PythonPipBuilder:ResolveDependencies Running
PythonPipBuilder:CopySource
Build Succeeded
Built Artifacts : .aws-sam\build Built Template :
.aws-sam\build\template.yaml
C:\xxxxxxx\lambda-python3.8>sam local invoke Invoking
app.lambda_handler (python3.8) Skip pulling image and use local one:
public.ecr.aws/sam/emulation-python3.8:rapid-1.51.0-x86_64.
Mounting C:\xxxxxxx\lambda-python3.8.aws-sam\build\HelloWorldFunction
as /var/task:ro,delegated inside runtime container Function
'HelloWorldFunction' timed out after 10 seconds
No response from invoke container for HelloWorldFunction
Any hints on what's missing here?
Thanks.
Mostly, a lambda function gets timed out because of some resource dependency. Are you using any external resource, maybe db connection or some REST API call ?
Please put more prints in lambda_handler(your function handler), before calling any resource, then you might know where exactly it is waiting. Also increase the timeout to 1 minute or more because most of the external resource call over HTTPS will have 30 secs timeouts.
The log suggests that either the container wasn't started, or SAM couldn't connect to it.
Sometimes the hostname resolution on Windows can be affected by hosts file or system settings.
Try running the invoke command as follows (this will make the container ports bind to all interfaces):
sam local invoke --container-host-interface 0.0.0.0
...additionally try setting the container-host parameter (set to localhost by default):
sam local invoke --container-host-interface 0.0.0.0 --container-host host.docker.internal
The next piece of puzzle is incorporating these settings into VSCODE. This can to be done in two places:
create samconfig.toml in the root dir of the project with the following contents. This will allow running sam local invoke from the terminal without having to add the command line argument:
version=0.1
[default.local_invoke.parameters]
container_host_interface = "0.0.0.0"
update launch configuration as follows to enable VSCode debugging:
...
"sam": {
"localArguments": ["--container-host-interface","0.0.0.0"]
}
...

Azure DevOps Pipelines "Waiting for console output from an agent..."

I require something from the output of a running release task in order for it to complete (an authenticate code). But the console is now not updating. All I get is "Waiting for console output from an agent..."
This happens on both our self-hosted agents (Linux or Windows) and on the Hosted Ubuntu 1604 agent.
The step in question is the standard Kubernetes task: https://github.com/Microsoft/azure-pipelines-tasks/tree/master/Tasks/KubernetesV1
This was not always happening.
To rule out the possibility of kubectl awaiting console input (as has been discussed above), you could try
kubectl apply --dry-run=client [other args]
or
kubectl apply --dry-run=server [other args]
This could give you guidance as to how to proceed, perhaps with --force or --overwrite flags if needed.
I have the same issue. After troubleshooting and canceling the task, I noticed that the agent was waiting for a response from the user.
In my case, I was trying to unzip a file where the destination folder already exists with content. So the system was asking the user to replace the destination folder content that's why the agent was waiting.
2020-03-23T04:14:57.8941954Z unzip /home/azure-deploy-test/AutoEcole.zip -d /home/test-deployment/
2020-03-23T04:14:57.9086229Z Archive: /home/azure-deploy-test/AutoEcole.zip
2020-03-23T04:14:57.9087639Z
2020-03-23T04:14:57.9136932Z ##[error]replace /home/test-deployment/AutoEcole? [y]es, [n]o, [A]ll, [N]one, [r]ename:
2020-03-23T04:53:12.1979529Z ##[error]The operation was canceled.
This was an issue with Microsoft's Azure DevOps Services that has been acknowledged and rectified by Microsoft.
This issue was reported as an issue with the "Liveness in Release Management UI".
All you have to do is access your project using the below URL:-
https://dev.azure.com/{your organization}/{your project}.
This is an official solution provided by Microsoft. This resolved the issue for me.
Please share more details in the comments section if you still face the issue.

Google cloud datalab deployment unsuccessful - sort of

This is a different scenario from other question on this topic. My deployment almost succeeded and I can see the following lines at the end of my log
[datalab].../#015Updating module [datalab]...done.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deployed module [datalab] to [https://main-dot-datalab-dot-.appspot.com]
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Step deploy datalab module succeeded.
Jul 25 16:22:36 datalab-deploy-main-20160725-16-19-55 startupscript: Deleting VM instance...
The landing page keeps showing a wait bar indicating the deployment is still in progress. I have tried deploying several times in last couple of days.
About additions described on the landing page -
An App Engine "datalab" module is added. - when I click on the pop-out url "https://datalab-dot-.appspot.com/" it throws an error page with "404 page not found"
A "datalab" Compute Engine network is added. - Under "Compute Engine > Operations" I can see a create instance for datalab deployment with my id and a delete instance operation with *******-ompute#developer.gserviceaccount.com id. not sure what it means.
Datalab branch is added to the git repo- Yes and with all the components.
I think the deployment is partially successful. When I visit the landing page again, the only option I see is to deploy the datalab again and not to start it. Can someone spot the problem ? Appreciate the help.
I read the other posts on this topic and tried to verify my deployment using - "https://console.developers.google.com/apis/api/source/overview?project=" I get the following message-
The API doesn't exist or you don't have permission to access it
You can try looking at the App Engine dashboard here, to verify that there is a "datalab" service deployed.
If that is missing, then you need to redeploy again (or switch to the new locally-run version).
If that is present, then you should also be able to see a "datalab" network here, and a VM instance named something like "gae-datalab-main-..." here. If either of those are missing, then try going back to the App Engine console, deleting the "datalab" service, and redeploying.

GitLab CI - Project Build In Neverending Pending-State

I'm in some trouble with GitLab CI.
I followed offical guide on:
https://github.com/gitlabhq/gitlab-ci/blob/master/doc/installation.md
Everything was ok, no errors nowhere. I followed Runner-Setup, too.
Anything alright.
But...
When I add a runner to a project and then try to build nothing happens.
It could be that I have not fully understood something or some of my configs are wrong.
I'm absolutely new to GitLab CI, but I like it and I want to learn new stuff.
I would be very very glad if someone could help me in some way.
Thanks!
BIG UPDATE:
Just figured out that:
~/gitlab-runners/gitlab-ci-runner$ bin/runner
Starting a runner process manually solves the problem but if I look at the gitlab-ci-runner in /etc/init.d -> it is running !?!
~/gitlab-runners/gitlab-ci-runner$ sudo /etc/init.d/gitlab-ci-runner start
Number of registered runners in PID file=1
Number of running runners=0
Error! GitLab CI runner(s) (gitlab-ci-runner) appear to be running already! Try stopping them first. Exiting.
~/gitlab-runners/gitlab-ci-runner$ sudo /etc/init.d/gitlab-ci-runner stop
Number of registered runners in PID file=1
Number of running runners=0
WARNING: Numbers of registered runners don't match number of running runners. Will try to stop them all
Registered runners=1
Running runners=0
Trying to stop registered runners...kill: No such process
OK
Trying to kill ghost runners...OK
What's wrong here? I'm out of my power or not seeing the problem?!
Problem solved!
You need to edit some values in /etc/init.d/gitlab-ci-runner script!
APP_ROOT="**PATH_TO**/gitlab-runners/gitlab-ci-runner"
APP_USER="**USER_WITH_DIRRIGHTS!**"
PID_PATH="$APP_ROOT/tmp/pids"
PROCESS_NAME="ruby ./bin/runner"
RUNNERS_PID="$PID_PATH/runners.pid"
RUNNERS_NUM=1 # number of runners to spawn
START_RUNNER="nohup bundle exec ./bin/runner"
Now it works!
In my case tags in the runner were different from tags in the .gitlab-ci.yml. Once I changed them so runner tags include all of the config file tests, tasks began to run.

WSO2 ESB Deployment Synchronizer stuck (can't gracefully shutdown or deploy services)

We are facing some issues with WSO2 ESB sincronizer, since we have a clustered configuration, we are using svn to store the content of "repository/deployment/server". The carbon.xml configuration is the following:
<DeploymentSynchronizer>
<Enabled>true</Enabled>
<AutoCommit>false</AutoCommit><!--true for the mgt node-->
<AutoCheckout>true</AutoCheckout>
<RepositoryType>svn</RepositoryType>
<SvnUrl>https://svn/x/trunk/serverESB/desenv/</SvnUrl>
<SvnUser>user</SvnUser>
<SvnPassword>password</SvnPassword>
<SvnUrlAppendTenantId>false</SvnUrlAppendTenantId>
</DeploymentSynchronizer>
It works correctly for some time, but after some deploys and undeploys it stops working. Although it still gives the message that it is going to sincronize and the svn update seems to be corectly performed, the esb does not load the newly deployed XMLs:
TID: [0] [ESB] INFO {org.wso2.carbon.core.deployment.SynchronizeRepositoryRequest} - Received [SynchronizeRepositoryRequest{tenantId=-1234, tenantDomain='carbon.super', messageId=f9b51e23-8a3c-4f08-acb0-5a1f0f4590b2}] {org.wso2.carbon.core.deployment.SynchronizeRepositoryRequest}
TID: [0] [ESB] INFO {org.wso2.carbon.core.deployment.SynchronizeRepositoryRequest} - Going to synchronse artefacts. {org.wso2.carbon.core.deployment.SynchronizeRepositoryRequest}
Normally after this message it prints INFO saying that new services where deployed, but it does no occour.
When i try to shutdown the server it gives me the message "Waiting for deployment completion...", and gets stuck (so i have to kill using "kill -9"):
TID: [0] [ESB] INFO {org.wso2.carbon.core.ServerManagement} - Waiting for deployment completion... {org.wso2.carbon.core.ServerManagement}
If I manually restarts it, all the deployments will work fine, and the sincronizer will start to work fine again (for some time).
p.s: I've tryed to use the OS's svn (SuSe) and also the SVNKit module. Our svn repository version is 1.5.1.
There are a few docs out there which are not upto date; hence would make things more difficult. Even I have tried those and ended up with unexpected problems.
Have you tried the instructions provided in latest WSO2-product clustering docs?
http://docs.wso2.org/wiki/display/Cluster/Creating+a+Cluster
http://docs.wso2.org/wiki/display/Cluster/Configuring+Deployment+Synchronizer
These information are upto date, and well tested with a sample set-up (ESB cluster with one Manager node and three worker nodes fronted by an Elastic Load Balancer). If you have followed those instructions, this should work fine. If you have already followed the same steps respectively and got stuck with this issue, please do confirm whether you have followed the instructions provided by this document, or not.
Thanks.