Connect containerized self-hosted agent with Azure DevOps - azure-devops

I followed the instructions in the ms docs guide, and the agent started without any issues. However it never showed up in my agent pool. I tried a different version of the start.sh script found on github and it connected immediately. Is there anything else I can do to try and troubleshoot this? Logs from the non-working agent below
❯ kubectl logs azpagent-55864668dc-zgdrn
1. Determining matching Azure Pipelines agent...
2. Downloading and installing Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
>> End User License Agreements:
Building sources from a TFVC repository requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
A copy of the Team Explorer Everywhere license agreement can be found at:
/azp/agent/externals/tee/license.html
>> Connect:
Connecting to server ...
>> Register Agent:
Scanning for tool capabilities.
Connecting to the server.
Successfully replaced the agent
Testing agent connection.
2019-08-03 04:22:56Z: Settings Saved.
4. Running Azure Pipelines agent...
Starting Agent listener interactively
Started listener process
Started running service
Scanning for tool capabilities.
Connecting to the server.
2019-08-03 04:23:08Z: Agent connect error: The signature is not valid.. Retrying until reconnected.
Not really sure what else to try -- has anyone else seen this issue, or had success with the linux agent guide?

Looking at the error message:
The signature is not valid.
There might be a problem with the provided PAT. I'd suggest generating a new PAT, as described by this guide, and trying again.
Let me know if this has helped.

Update
According to the error info The signature is not valid..
Are you using and building sources from a TFVC repository which requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
If so have a try with building from Git repo.
The doc you referred a different version of the start.sh script which is deprecated. It's for an old build agent.
According to this and related error The signature is not valid.. Retrying until reconnected. Few things I would suggest:
You may on a pretty old agent version , try the latest version agent
https://github.com/microsoft/azure-pipelines-agent/releases
You need to restart the agent process in order to make those
environment take affect.
Check with your IT department, make sure the network between your
build machine and tfs server/Azure DevOps Service is reliable, see
whether there is any change in your network.
Also make sure your build machine/VM not run out of resource.

In case this or a similar issue occurs for anyone else, the suggestion from #juliobbv was very helpful. If you comment out the last line of the script, and replace it with
./bin/Agent.Listener run & wait $!
you can get a clearer view of any error messages.
In my case, I didn't realize that AGENT_NAME and POOL were no longer the same variable, and the original error message didn't indicate that the issue was my lack of permissions to the default pool.
My final changes to the script are below -- I defaulted to the agent name using the hostname, and maintained the previous behavior of using a custom pool
./config.sh --unattended \
--agent "$(hostname)" \
--url "$AZP_URL" \
--auth PAT \
--token $(cat "$AZP_TOKEN_FILE") \
--pool "${AZP_POOL:-Default}" \
--work "${AZP_WORK:-_work}" \
--replace \
--acceptTeeEula & wait $!

Related

Agent version 2.173.0 fails to connect to Azure DevOps

Agent Version and Platform
2.173.0
on
centos-release-7-6.1810.2.el7.centos.x86_64
It's a release agent for a deployment pool.
Azure DevOps Type and Version
dev.azure.com (cloud)
What's not working?
# Running run once with agent version 2.160.1
./run.sh --once
Scanning for tool capabilities.
Connecting to the server.
2020-08-25 21:31:02Z: Listening for Jobs
Agent update in progress, do not shutdown agent.
Downloading 2.173.0 agent
Waiting for current job finish running.
Generate and execute update script.
Agent will exit shortly for update, should back online within 10 seconds.
‘/root/azagent/_diag/SelfUpdate-20200825-213148.log’ -> ‘/root/azagent/_diag/SelfUpdate-20200825-213148.log.succeed’
Scanning for tool capabilities.
Connecting to the server.
# this now runs indefinitely
Is there a way to stop the auto update? Multiple agents on production machines are offline and I have, as of now, no idea how to fix that.
agent.log
Edit: It is a Release Agent in a Deployment Group. Also, there is a Github issue now https://github.com/microsoft/azure-pipelines-agent/issues/3093
To resolve the Authentication failed with status code 401 you can try steps below:
1.Create a new PAT with manage permission:
Then reconfigure the agent with config.sh file.
2.If that not works, try creating a new Agent pool to register new agents:
To stop the auto update, you should disable this option (Organization settings=>Agent Pools=>Settings):

Azure DevOps Agent won't start and shows: Error 1 Incorrect Function - Service could not start

I configured the build agent as a service but when I go to start the agent I get the error:
Error 1 Incorrect Function - Service could not start
Azure DevOps Agent configured as a service but service does not start
I changed my user to Local System (NT AUTHORITY\SYSTEM) in services.msc and it worked as intended.
Copied from the comments:
ok ill answer my own question, when the config.cmd command is run, it
allocates the network service as the account to run the service.
However it does NOT automatically give permissions to where the agent
folders are installed. So it fails to run. Stupid as this should be
flagged when running the config.cmd command! The error message is
nonsense and misleading. So if the agent is in c:\users\abc\agent you
need to give the network service permissions to access that folder!
Running from C:\Agent worked perfect for me after struggling for a few days.
This happens when the service Azure Pipelines Agent in the machine is set with Log On As NETWORK SERVICE.
To resolve this, we need to
Right Click on the Azure Pipelines Agent
Properties
Click on LogOn tab
select Local System Account
The final output should be as per the attached image.

How to implement Hasicorp vault 3 node cluster?

We have implemented hasicorp open source vault as single node with consul backend.
We need help regarding implementing vault 3 node cluster for HA in single datacenter and also well as in multidatacenter.
Could you please help me on this.
The Vault Deployment guide has more on this.
https://learn.hashicorp.com/vault/operations/ops-deployment-guide#help-and-reference
Combine it with this guide: https://learn.hashicorp.com/vault/operations/ops-vault-ha-consul
I shall assume, just based on the fact that you've already gotten a single node up with a Consul backend, that you already know a little about scripting, Git, configuring new SSH connections, installing software, and Virtual Machines.
Also, these are hard to explain, and have much better resources elsewhere.
Further if you get stuck with the prerequisites, tools to install, or downloading the code, please have a look at the resources on the internet.
If you get an error with Vault working improperly, though, make a Github issue ASAP.
Anyway, with that out of the way, the short answer is this:
Step 1: Set up 3 Consul servers, each with references to each other.
Step 2: Set up 3 Vault servers, each of them independent, but with a reference to a Consul
address as their Storage Backend.
Step 3: Initialize the Cluster with your brand new Vault API.
Now for the long answer.
Prerequisites
OS-Specific Prerequisites
MacOS: OSX 10.13 or later
Windows: Windows must have Powershell 3.0 or later. If you're on Windows 7, I recommend Windows Management Framework 4.0, because it's easier to install
Vagrant
Set up Vagrant, because it will take care of all of the networking and resource setup for the underlying infrastructure to use, here.
Especially for Vagrant, the Getting Started guide takes about 30 minutes once you have Vagrant and Virtualbox installed: https://www.vagrantup.com/intro/getting-started/index.html
Install Tools
Make sure you have Git installed
Install the latest version of Vagrant (NOTE: WINDOWS 7 AND WINDOWS 8 REQUIRE POWERSHELL >= 3)
Install the latest version of VMWare or Virtualbox
Download the Code for this demonstration
Related Vendor Documentation Link: https://help.github.com/articles/cloning-a-repository
git clone https://github.com/v6/super-duper-vault-train.git
Use this Code to Make a Vault Cluster
Related Vagrant Vendor Documentation Link: https://www.vagrantup.com/intro/index.html#why-vagrant-
cd super-duper-vault-train
vagrant up ## NOTE: You may have to wait a while for this, and there will be some "connection retry" errors for a long time before a successful connection occurs, because the VM is booting. Make sure you have the latest version, and try the Vagrant getting started guide, too
vagrant status
vagrant ssh instance5
After you do this, you'll see your command prompt change to show vagrant#instance5.
You can also vagrant ssh to other VMs listed in the output of vagrant status.
You can now use Vault or Consul from within the VM for which you ran vagrant ssh.
Vault
Explore the Vault Cluster
ps -ef | grep vault ## Check the Vault process (run while inside a Vagrant-managed Instance)
ps -ef | grep consul ## Check the Consul process (run while inside a Vagrant-managed Instance)
vault version ## Output should be Vault v0.10.2 ('3ee0802ed08cb7f4046c2151ec4671a076b76166')
consul version ## Output should show Consul Agent version and Raft Protocol version
The Vagrant boxes have the following IP addresses:
192.168.13.35
192.168.13.36
192.168.13.37
Vault is on port 8200.
Consul is on port 8500.
Click the Links
http://192.168.13.35:8200 (Vault)
http://192.168.13.35:8500 (Consul)
http://192.168.13.36:8200 (Vault)
http://192.168.13.36:8500 (Consul)
http://192.168.13.37:8200 (Vault)
http://192.168.13.37:8500 (Consul)
Start Vault Data
Related Vendor Documentation Link: https://www.vaultproject.io/api/system/init.html
Start Vault.
Run this command on one of the Vagrant-managed VMs, or somewhere on your computer that has curl installed.
curl -s --request PUT -d '{"secret_shares": 3,"secret_threshold": 2}' http://192.168.13.35:8200/v1/sys/init
Unseal Vault
Related Vendor Documentation Link: https://www.vaultproject.io/api/system/unseal.html
This will unseal the Vault at 192.168.13.35:8200. You can use the same process for 192.168.13.36:8200 and 192.168.13.37:8200.
Use your unseal key to replace the value for key abcd1430890..., and run this on the Vagrant-managed VM.
curl --request PUT --data '{"key":"abcd12345678..."}' http://192.168.13.35:8200/v1/sys/unseal
Run that curl command again. But use a different value for "key":. Replace efgh2541901... with a different key than you used in the previous step, from the keys you received when running the v1/sys/init endpoint.
curl --request PUT --data '{"key":"efgh910111213..."}' http://192.168.13.35:8200/v1/sys/unseal
Non-Vagrant
Please refer to the file PRODUCTION_INSTALLATION.md in this repository.
Codified Vault Policies and Configuration
To Provision Vault via its API, please refer to the
provision_vault folder.
It has data and scripts.
The data folder's tree corresponds to the HashiCorp Vault API
endpoints, similar to the following:
https://www.hashicorp.com/blog/codifying-vault-policies-and-configuration#layout-and-design
You can use the Codified Vault
Policies and Configuration
with your initial Root token, after
initializing and unsealing Vault,
to configure Vault quickly via its API.
The .json files inside each folder
correspond to the payloads to send to Vault
via its API, but there may also be .hcl,
.sample, and .sh files for convenience's sake.
Hashicorp have written some guidance on how to get started with setting up a HA Vault and Consul configuration:
https://www.vaultproject.io/guides/operations/vault-ha-consul.html

Using Google Cloud Storage with rsync

I am new to Google Cloud. We have historically used AWS for online backups -- essentially, our local servers ran rsync to an EC2 instance at AWS and it all worked fine. I'm now trying to migrate from AWS to Google and of course the setup is pretty different. With gsutil rsync it looked to me as though I wouldn't need to spin up a Compute Engine at all, I could just push stuff straight into gs://aws_mnt bucket
Having installed the SDK on our AWS instance I was able to push all our backups to the gs://aws_mnt bucket very easily using gsutil cp -n
But going forward I want to run a cron job on the local server which uses rsync rather than cp for obvious reasons.
I have two issues:
Despite reading the appropriate documentation (here) I am so stupid I can't figure out how to permanently authorise the local server so I don't have to do gcloud auth login and get a code from a browser each session, as for a cron job that's not really going to work.
When I try to use gsutil rsync from the local server to the gs://aws_mnt bucket that was pre-populated from AWS, I get an error:
gsutil rsync /mnt/archive/backups gs://aws_mnt/kahless
Building synchronization state...
Skipping cloud sub-directory placeholder object gs://aws_mnt/kahless/
Starting synchronization
There is some discussion of this error on github and I've produced detailed output from
gsutil -D -m rsync /mnt/archive/backups gs://aws_mnt/kahless
But since this is a brand-new install of the SDK I can't imagine the thread hasn't already been dealt with so I must be doing something wrong?
Rus
In response to your questions:
Once you have configured credentials using gcloud auth, the 'gcloud auth login' command will cause them to be selected until you login to a different credential... and that state will persist and not require you to go through the browser session again unless/until you revoke those credentials. Note: If you're thinking of running commands from an unattended script (e.g., via cron) please consider using service account credentials. For more details please see https://developers.google.com/cloud/sdk/gcloud/#gcloud.auth
That "skipping..." message is not an error - it's just informing you that gsutil is skipping trying to download the placeholder object, because such objects aren't needed in (and would interfere with) directories in the local file system. I'll update the message in the next version of gsutil to make this more clear. So, what you saw was that the second run of gsutil rsync found nothing to do after comparing the source and destination, and completed normally.

Azure deployment with PowerShell, "New-AzureDeployment : There was no endpoint listening at https://management.core.windows.net/..."

Following the guide and powershell script from this article,
https://www.windowsazure.com/en-us/develop/net/common-tasks/continuous-delivery/
I've run into an extremely odd error:
9/4/2012 9:02 PM - Creating New Deployment: In progress
New-AzureDeployment : There was no endpoint listening at https://management.core.windows.net/5921d8af-88a1-4f63-9673-5e1ae1df7e8a/services/storageservices/Build_2012-09-04_02-27.1/dist/LNEC_Admin.Azure.cspkg/keys that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
It's odd because we're on build "Build_2012-09-04_08-16.1", not the one mentioned in the URL above (which no longer even exists on the filesystem). This is under Jenkins CI which runs under the NETWORK SERVICE account. If I run it by hand with my own account the same error results, but with a lnecint in place of the build directory: https://management.core.windows.net/5921d8af-88a1-4f63-9673-5e1ae1df7e8a/services/storageservices/lnecint/keys
That keyword "lnecint" isn't mentioned anywhere in any config (I've searched every file on the entire machine and TFS server). It was the name of a storage account, but it's long ago been deleted.
VS 2012, Azure SDK 1.7.1
There's definitely an issue with your endpoint. Can you check what parameters you're passing to the "New-AzureDeployment" Cmdlet?