How to upgrade salt-minion service on Windows Servers from Salt-master without getting an error - upgrade

So I've a single master which have around 300+ minions connected to it. With the latest upgrade 3004.1; I want to upgrade all of them. They are a mixture of Linux and Windows servers.
I'm able to upgrade the Linux servers flawlessly from the salt-master using a state file as suggested by the FAQ section documentation of Salt which can be found here
However, when I'm trying to do the same thing for Windows, the upgrade happens in the background without any issues but I get an error as Minion did not return. [No Response]. Below is the code which is installing the latest package on the Windows Minion.:
Upgrade Salt Minion:
pkg.installed:
- name: salt-minion
- version: 3004.1
- order: last
Enable Salt Minion:
service.enabled:
- name: salt-minion
- require:
- pkg: Upgrade Salt Minion
Restart Salt Minion:
cmd.run:
{%- if grains['kernel'] == 'Windows' %}
- name: 'C:\salt\salt-call.bat service.restart salt-minion'
{%- else %}
- name: 'salt-call service.restart salt-minion'
{%- endif %}
- bg: True
- onchanges:
- pkg: Upgrade Salt Minion
As we can see above, it's a fairly simple state file to install the minion service. I was thinking if there's anyway wherein we can put an explicit sleep in the install command so that when the response is returned, I can see Green output with proper results.
Note: I can still re-run the state file again and it will give me the proper result on the second time as it will skip the ones that's already upgraded. But I think there should be a more correct way of doing this via Salt.
Anyone else have upgraded their Windows minions via Master. Please share your thoughts.
2nd Qts: Since this is kinda related so asking it here if anyone else have found a solution? I'm unable to install the latest Salt package on Windows Server 2008? Has anyone else faced a similar problem? I know the server is quite and at EOL but for some unavoidable circumstances it will take time for the upgrade to happen. Suggestions on this please:

Related

gcloud app deploy flexible environment hangs

Title is self explanatory. I simply downloaded the flexible-hello-world app and deployed it without (almost) a single modification - I deployed it to a service called some-service using this app.yaml
# Copyright 2017, Google, Inc.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# [START gae_flex_quickstart_yaml]
runtime: nodejs
env: flex
service: some-service
# This sample incurs costs to run on the App Engine flexible environment.
# The settings below are to reduce costs during testing and are not appropriate
# for production use. For more information, see:
# https://cloud.google.com/appengine/docs/flexible/nodejs/configuring-your-app-with-app-yaml
manual_scaling:
instances: 1
resources:
cpu: 1
memory_gb: 0.5
disk_size_gb: 10
# [END gae_flex_quickstart_yaml]
I have a billing account enabled for my project.
It hangs at this line:
...
7104cb4c0c814fa53787009 size: 2385
Finished Step #1
PUSH
DONE
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Updating service [some-service] (this may take several minutes)...⠹
When I go to the app engine console I see it has deployed and can access the URL here: https://some-service-dot-ashored-cloud-dv.uk.r.appspot.com/
But the 502 never goes away.
Help!
EDIT:
Some more information, while it is still hung on the deploy command, I run this in another terminal:
gcloud app instances list --service some-service
and I get:
SERVICE VERSION ID VM_STATUS DEBUG_MODE
some-service 20201119t184828 aef-some--service-20201119t184828-fr7n TERMINATED
EDIT 2:
When I try to ssh to it I get more weirdness:
gcloud app instances ssh aef-some--service-20201119t184828-fr7n --service some-service --version 20201119t184828
WARNING: This instance is serving live application traffic. Any changes made could
result in downtime or unintended consequences.
Do you want to continue (Y/n)?
Sending public key to instance [apps/ashored-cloud-dv/services/some-service/versions/20201119t184828/instances/aef-some--service-20201119t184828-fr7n].
Waiting for operation [apps/ashored-cloud-dv/operations/9de7b298-f4e9-47a7-8a8e-11411e649d50] to complete...done.
ERROR: gcloud crashed (TypeError): can only concatenate str (not "NoneType") to str
EDIT 2:
glcloud --version output:
Google Cloud SDK 319.0.0
bq 2.0.62
cloud-build-local
core 2020.11.13
gsutil 4.55
tl;dr; permissions
I removed the editor permission from my default app engine service account as recommended in the IAM dashboard.
Nowhere in the docs (that I could find) does it tell you what permissions are needed to deploy a app engine flexible service.
Turns out, you need:
Logs Writer
Storage Object Viewer
Without Storage Object Viewer you'll get an error on deployment telling you the exact issue. Without Logs Writer you will not get an error, but the service will never come up.
What a long 10 days...
EDIT: I was wrong, it says here in the docs what permissions you need.
I asked Google Support to file an internal bug that the correct error message is not being returned if you do not have Logs Writer

Rundeck NodesetEmptyException: No matched nodes

Unfortunately, occassionally we receive a following Rundeck exception for some of the scheduled jobs:
com.dtolabs.rundeck.core.NodesetEmptyException: No matched nodes: NodeSet{includes={name= myrealhost.com, dominant=false, }}
at com.dtolabs.rundeck.core.execution.workflow.NodeFirstWorkflowExecutor.validateNodeSet(NodeFirstWorkflowExecutor.java:369)
at com.dtolabs.rundeck.core.execution.workflow.NodeFirstWorkflowExecutor.executeWorkflowImpl(NodeFirstWorkflowExecutor.java:90)
at com.dtolabs.rundeck.core.execution.workflow.BaseWorkflowExecutor.executeWorkflow(BaseWorkflowExecutor.java:222)
at com.dtolabs.rundeck.core.execution.WorkflowExecutionServiceThread.runWorkflow(WorkflowExecutionServiceThread.java:83)
at com.dtolabs.rundeck.core.logging.LoggingManagerImpl$MyPluginLoggingManager.runWith(LoggingManagerImpl.java:148)
at com.dtolabs.rundeck.core.execution.WorkflowExecutionServiceThread.run(WorkflowExecutionServiceThread.java:74)
Exception: class com.dtolabs.rundeck.core.NodesetEmptyException: No matched nodes: NodeSet{includes={name= myrealhost.com, dominant=false, }}
No matched nodes: NodeSet{includes={name=myrealhost.com, dominant=false, }}
While hunting for a solution online, I came across a relevant Github issue https://github.com/rundeck/rundeck/issues/2942#issuecomment-360573113. Unfortunately, the fix suggested didn't help.
Please note the following details for your reference:
Rundeck version: 3.1.0-20190731
DB Type: MySQL 8
Install type: war
OS: Ubuntu 18
Use Asynchronous Cache: True
Cache Delay: 3600
Synchronous First Load: Enabled
Inventory source: Ansible Dynamic Inventory -- a Python script to load data from EtcD
Another relevant Github issue: https://github.com/rundeck/rundeck/issues/4231
Also, I haven't noticed any errors from Python inventory script except the aforementioned exception in Rundeck job log.
I had no choice but to reach out to the Rundeck community at Stack Overflow as the relevant Github issues seemed dormant.
I will appreciate any help on this front.
Thanks in advance.
It seems a node cache problem that happens sometimes. You can try this workaround.

Go Stackdriver debugger error loading program

I am trying to set up Stackdriver debugging using Go. Using the article and this great medium post I came up with this solution.
Key parts, in cloudbuild.yaml
- name: gcr.io/cloud-builders/wget
args: [
"-O",
"go-cloud-debug",
"https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug"
]
...
Dockerfile I have
...
COPY gopath/bin/stackdriver-demo /stackdriver-demo
ADD go-cloud-debug /
ADD source-context.json /
CMD ["/go-cloud-debug","-sourcecontext=./source-context.json", "-appmodule=go-errrep","-appversion=1.0","--","/stackdriver-demo"]
...
However the pods keeps crashing, the container logs show this error:
Error loading program: decoding dwarf section info at offset 0x0: too short
EDIT: Using https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug may be outdated as I haven't seen it used outside Daz's medium post. The official docs uses the package cloud.google.com/go/cmd/go-cloud-debug-agent
I have update cloudbuild.yaml file to install this package:
- name: 'gcr.io/cloud-builders/go'
args: ["get", "-u", "cloud.google.com/go/cmd/go-cloud-debug-agent"]
env: ['PROJECT_ROOT=github.com/roberson34/stackdriver-demo', 'CGO_ENABLED=0', 'GOOS=linux']
- name: 'gcr.io/cloud-builders/go'
args: ["install", "cloud.google.com/go/cmd/go-cloud-debug-agent"]
env: ['PROJECT_ROOT=github.com/roberson34/stackdriver-demo', 'CGO_ENABLED=0', 'GOOS=linux']
And in the Dockerfile I can get access to the binary in gopath/bin/go-cloud-debug-agent
When I execute the gopath/bin/go-cloud-debug-agent with my own program as an argument:
/go-cloud-debug-agent -sourcecontext=./source-context.json -appmodule=go-errrep -appversion=1.0 -- /stackdriver-demo
I get another opaque error:
Error loading program: AttrStmtList not present or not int64 for unit 88
So basically using the cloud-debug binary from https://storage.googleapis.com/cloud-debugger/compute-go/go-cloud-debug and cloud-debug-agent binary from the package cloud.google.com/go/cmd/go-cloud-debug-agent both don't work and give different errors.
Would appreciate any tips on what I'm doing wrong and how to fix it.
OK :-)
Yes, you should follow the current Stackdriver documentation, e.g. go-cloud-debug-agent
Unfortunately, there are now various issues with my post including a (currently broken) gcr.io/cloud-builders/kubectl for regions.
I think your issue pertains to your use of golang:alpine. Alpine uses musl rather than the glibc that you find on most other Linux distro's and so, you really must compile for Alpine to ensure your binaries reference the correct libc.
I'm able to get your solution working primarily by switching your Dockerfile to pull the Cloud Debug Agent while on Alpine and to compile your source on Alpine:
FROM golang:alpine
RUN apk add git
RUN go get -u cloud.google.com/go/cmd/go-cloud-debug-agent
ADD main.go src
RUN CGO_ENABLED=0 go build -gcflags=all='-N -l' src/main.go
ADD source-context.json /
CMD ["bin/go-cloud-debug-agent","-sourcecontext=/source-context.json", "-appmodule=stackdriver-demo","-appversion=1.0","--","main"]
I think that should get you beyond the errors that you documented and you should be able to deploy your container to Kubernetes.
I've made my version of your image publicly available (and will retain it for a few days for you):
gcr.io/dazwilkin-190402-55473323/roberson34#sha256:17cb45f1320e2fe04e0681310506f4c229896429192b0d1c2c8dc20ed54adb0d
You may wish to reference it (by that digest) in your deployment.yaml
NB For Error Reporting to be "interesting", your code needs to generate errors and, with your example, this is going to be challenging (usually a good thing). You may consider adding another errorful handler that always results in errors so that you may test the service.

Ansible deploy security certificates to remote servers

I'm a bit stumped on an Ansible issue. I've gotten a portion of my setup script working for my database servers, and I would like Ansible to be able to manage each server's postgresql.conf file. I currently have it pushing out an up-to-date copy of the config file, but this has presented a problem.
Our security certificates are unique to each server, and the postgresql.conf has parameters for setting these up for each server. I've currently got Ansible calculating the proper initial values for things like shared_buffers and effective_cache_size, but do not know how to get it to push a unique certificate out to each remote server, or to uniquely set the name in the config file to match the certificate name.
Are these even possible with Ansible?
I had a similar requirement recently to deploy a specific file (Java keystore) per host, along with the relevant keystore password to decrypt and use it.
For the per-host keystore password, I used host_vars: inside the group host file inventory/demoGroupName/hosts:
hostname1.com keystore_password="{{ vault_keystore_password_hostname1.com }}"
hostname2.com keystore_password="{{ vault_keystore_password_hostname2.com }}"
and then in inventory/demoGroupName/vault:
vault_keystore_password_hostname1.com=superSecurePassword
vault_keystore_password_hostname2.com=nonRepeatedPassword
(Note: the use of vault_ as a prefix is recommended in Ansible's best practices, but feel free to modify this to suit your scenario)
and then in the job, simply place {{ keystore_password }} in the relevant spot, as an example:
- name: arbitrary tasks
command: configure-keystore --password="{{ keystore_password }}"
Now in my scenario, I placed the individual keystores into the roles/role_name/files directory and copied it across by substituting in the {{ inventory_hostname }} as part of the filename, but this answer gives a better solution in my opinion. Either will work for your situation, but the latter is probably better long-term. If the name of the certificate can be made the same for all hosts, that will also simplify your situation somewhat.

Are requisites required, or is order sufficient?

The Salt docs are full of this kind of pattern:
apache:
pkg:
- installed
service:
- running
- require:
- pkg: apache
This repetition ("install apache, now check whether apache was installed") seems to be a violation of don't-repeat-yourself (DRY). So is it necessary?
From "Understanding State Ordering":
To accomplish something similar to how classical imperative systems function all requisites can be omitted and the failhard option then set to True in the master configuration, this will stop all state runs at the first instance of a failure.
This seems to imply that the use of requisites everywhere is actually optional (assuming that the declaration order is correct) - but I'd like to know for sure.
It is a remnant of the pre 0.15 days when states weren't executed top down.
Ordering is now sufficient.
States are now executed in the order they are declared in your sls files. Where you will still want to use "require" is if you want to ensure a certain state executes successfully before another.
For example, you may want to ensure a software package is installed correctly before attempting to lay down a config file.
apache:
pkg:
- installed
file:
- managed
- name: /etc/apache/httpd.conf
- source: salt://apache/httpd.conf
- require:
- pkg: apache
Without the "require" in the above example, the config file would be laid down even if the apache pkg failed to install.