How to restart ubuntu agent on Azure Build? - azure-devops

Long story short, after trying out several solutions here to kill the VBCSCompiler before the MSBuild task didnt work out, I am gonna try one more option before calling it a day and just having to stick to windows2019 agent, even though the build time will be tripled.
So, after NuGet restore task, i need to reboot the ubuntu agent (hosted by Azure Pipelines Agent pool), i added a command-line task, but i am not sure what to write for the script...
I tried the following script command sudo reboot
but it didn't work (kept running for a while so i just cancelled the build)
I've also tried this command instead:
init 6
but I got an error:
Failed to set wall message, ignoring: Interactive authentication required.
Failed to reboot system via logind: Interactive authentication required.
Failed to open initctl fifo: Permission denied
Failed to talk to init daemon.

It's impossible, when you restart your Hosted Agent your build will fail. this is reason why is not allowed.

Related

bash task in Azure Devops not working on self hosted windows agent

Bash task is giving below error when running on self hosted windows agent -
Windows Subsystem for Linux has no installed distributions.
##[error]The process 'C:\Windows\system32\bash.exe' failed with exit code 1
Windows subsystem is installed and I can execute bask scripts by logging into agent but bash task does not work.
Happy to provide more information if required
bash task in Azure Devops not working on self hosted windows agent
That is because the WSL installed per a specific user, but if we run the self-hosted agent as service, it will using the NETWORK SERVICE/SYSTEM account instead of local user account.
You could try to set Log On As with the specific user (It should be the same user as you installed WSL. Then restart your server.
Detailed step for setting the Log On As with the specific user:
Press the Win + R keys on your keyboard, to open the Run window. Then, type “services.msc” and hit Enter or press OK.
Find the service for the self-hosted agent and right click it, select Properties, switch to Log On tab.
Change the account to the specific user.
For example, I use xxxxxx\leoliu account to login the server 2019:
Then install WSL with that account. After that, I open the services.msc and change the Log On As with that user:
Last but not least, restart your server, just restart the services is not enough.
Now, I could use my self-hosted agent to run the bask task:

Agent version 2.173.0 fails to connect to Azure DevOps

Agent Version and Platform
2.173.0
on
centos-release-7-6.1810.2.el7.centos.x86_64
It's a release agent for a deployment pool.
Azure DevOps Type and Version
dev.azure.com (cloud)
What's not working?
# Running run once with agent version 2.160.1
./run.sh --once
Scanning for tool capabilities.
Connecting to the server.
2020-08-25 21:31:02Z: Listening for Jobs
Agent update in progress, do not shutdown agent.
Downloading 2.173.0 agent
Waiting for current job finish running.
Generate and execute update script.
Agent will exit shortly for update, should back online within 10 seconds.
‘/root/azagent/_diag/SelfUpdate-20200825-213148.log’ -> ‘/root/azagent/_diag/SelfUpdate-20200825-213148.log.succeed’
Scanning for tool capabilities.
Connecting to the server.
# this now runs indefinitely
Is there a way to stop the auto update? Multiple agents on production machines are offline and I have, as of now, no idea how to fix that.
agent.log
Edit: It is a Release Agent in a Deployment Group. Also, there is a Github issue now https://github.com/microsoft/azure-pipelines-agent/issues/3093
To resolve the Authentication failed with status code 401 you can try steps below:
1.Create a new PAT with manage permission:
Then reconfigure the agent with config.sh file.
2.If that not works, try creating a new Agent pool to register new agents:
To stop the auto update, you should disable this option (Organization settings=>Agent Pools=>Settings):

Connect containerized self-hosted agent with Azure DevOps

I followed the instructions in the ms docs guide, and the agent started without any issues. However it never showed up in my agent pool. I tried a different version of the start.sh script found on github and it connected immediately. Is there anything else I can do to try and troubleshoot this? Logs from the non-working agent below
❯ kubectl logs azpagent-55864668dc-zgdrn
1. Determining matching Azure Pipelines agent...
2. Downloading and installing Azure Pipelines agent...
3. Configuring Azure Pipelines agent...
>> End User License Agreements:
Building sources from a TFVC repository requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
A copy of the Team Explorer Everywhere license agreement can be found at:
/azp/agent/externals/tee/license.html
>> Connect:
Connecting to server ...
>> Register Agent:
Scanning for tool capabilities.
Connecting to the server.
Successfully replaced the agent
Testing agent connection.
2019-08-03 04:22:56Z: Settings Saved.
4. Running Azure Pipelines agent...
Starting Agent listener interactively
Started listener process
Started running service
Scanning for tool capabilities.
Connecting to the server.
2019-08-03 04:23:08Z: Agent connect error: The signature is not valid.. Retrying until reconnected.
Not really sure what else to try -- has anyone else seen this issue, or had success with the linux agent guide?
Looking at the error message:
The signature is not valid.
There might be a problem with the provided PAT. I'd suggest generating a new PAT, as described by this guide, and trying again.
Let me know if this has helped.
Update
According to the error info The signature is not valid..
Are you using and building sources from a TFVC repository which requires accepting the Team Explorer Everywhere End User License Agreement. This step is not required for building sources from Git repositories.
If so have a try with building from Git repo.
The doc you referred a different version of the start.sh script which is deprecated. It's for an old build agent.
According to this and related error The signature is not valid.. Retrying until reconnected. Few things I would suggest:
You may on a pretty old agent version , try the latest version agent
https://github.com/microsoft/azure-pipelines-agent/releases
You need to restart the agent process in order to make those
environment take affect.
Check with your IT department, make sure the network between your
build machine and tfs server/Azure DevOps Service is reliable, see
whether there is any change in your network.
Also make sure your build machine/VM not run out of resource.
In case this or a similar issue occurs for anyone else, the suggestion from #juliobbv was very helpful. If you comment out the last line of the script, and replace it with
./bin/Agent.Listener run & wait $!
you can get a clearer view of any error messages.
In my case, I didn't realize that AGENT_NAME and POOL were no longer the same variable, and the original error message didn't indicate that the issue was my lack of permissions to the default pool.
My final changes to the script are below -- I defaulted to the agent name using the hostname, and maintained the previous behavior of using a custom pool
./config.sh --unattended \
--agent "$(hostname)" \
--url "$AZP_URL" \
--auth PAT \
--token $(cat "$AZP_TOKEN_FILE") \
--pool "${AZP_POOL:-Default}" \
--work "${AZP_WORK:-_work}" \
--replace \
--acceptTeeEula & wait $!

Workload Scheduler job won't enable

I'm trying to create a Workload Scheduler job that executes a curl command.
In Steps I've selected Start a program for the step and RP_CLOUD as the Agent(it's the only option). I pasted my curl command into Program.
Now when I try to enable the job I get a popup saying <b>AWSUI4177E</b><br />AWSUI4177E Unable to update the Process.<br /><i>AWSUI4299E An internal error has occurred: AWSPRE001E The user "paul.carron#anaeko.com.5c81ed484ccf4c54aa9e348e" cannot create a job of type "executable" on the "RP_CLOUD" workstation. Download and install a Workload Automation Agent on a different machine.</i>.
The curl statement works when executd in my Terminal. What am I doing wrong?
There are some security constrains on running jobs on the agents provided by the infrastructure.
I see two options:
Use the restful job type (since you are invoking a curl command)
Install an agent

TFS Build "PowerShell on Target Machines" Step Fails: How to debug?

I'm trying to automate the deployment of the solution my team is working on through TFS Build server. One of the steps which executes a PowerShell script on the target machine fails with the following error:
Microsoft ODBC Driver 11 for SQL Server : Login failed for user 'sa'..
The PowerShell script I'm trying to execute does in fact connect to multiple databases using the sa credentials. When I try to execute the same script passing it the exact same arguments by hand (i.e: executing the script from the target machine VM itself) it works like a charm. But when it is being executed as part of the build steps it fails with the aforementioned error.
Is there a way to further debug the issue? It would be great if there is a way to output trace statements from the script so I could have some insight on what is actually going on.
Usually all the related error should reflect in TFS build log. To narrow your issue you can try to connect to the TFS build agent with the credentials used for the build service and manually run the ps script.
If you execute the ps script with your own account, which will not help to the issue. Usually this kind of problems is related to permissions. Your build service account are lack of related permission. Try to add it to Administrator or SQL Administrator group and execute the build again.