How to match the PrometheusRule to the AlertmanagerConfig with Prometheus Operator - kubernetes

I have multiple prometheusRules(rule a, rule b), and each rule defined different exp to constraint the alert; then, I have different AlertmanagerConfig(one receiver is slack, then other one's receiver is opsgenie); How can we make a connection between rules and alertmanagerconfig? for example: if rule a is triggered, I want to send message to slack; if rule b is triggered, I want to send message to opsgenie.
Here is what I tried, however, that does not work. Did I miss something?
This is prometheuisRule file
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: service-prometheus
role: alert-rules
app: kube-prometheus-stack
release: monitoring-prom
name: rule_a
namespace: monitoring
spec:
groups:
- name: rule_a_alert
rules:
- alert: usage_exceed
expr: salesforce_api_usage > 100000
labels:
severity: urgent
This is alertManagerConfig
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
labels:
alertmanagerConfig: slack
name: slack
namespace: monitoring
resourceVersion: "25842935"
selfLink: /apis/monitoring.coreos.com/v1alpha1/namespaces/monitoring/alertmanagerconfigs/opsgenie-and-slack
uid: fbb74924-5186-4929-b363-8c056e401921
spec:
receivers:
- name: slack-receiver
slackConfigs:
- apiURL:
key: apiURL
name: slack-config
route:
groupBy:
- job
groupInterval: 60s
groupWait: 60s
receiver: slack-receiver
repeatInterval: 1m
routes:
- matchers:
- name: job
value: service_a
receiver: slack-receiver

You need to match on a label of the alert, in your case you're trying to match on the label job with the value service_a which doesn't exist. You could either match on a label that does exist in the prometheuisRule file, eg severity, by changing the match in the alertManagerConfig file:
route:
routes:
- match:
severity: urgent
receiver: slack-receiver
or you could add another label to the prometheuisRule file:
spec:
groups:
- name: rule_a_alert
rules:
- alert: usage_exceed
expr: salesforce_api_usage > 100000
labels:
severity: urgent
job: service_a

Related

Tekton YAML TriggerTemplate - string substitution

I have this kind of yaml file to define a trigger
`
apiVersion: triggers.tekton.dev/v1alpha1
kind: TriggerTemplate
metadata:
name: app-template-pr-deploy
spec:
params:
- name: target-branch
- name: commit
- name: actor
- name: pull-request-number
- name: namespace
resourcetemplates:
- apiVersion: tekton.dev/v1alpha1
kind: PipelineRun
metadata:
generateName: app-pr-$(tt.params.actor)-
labels:
actor: $(tt.params.actor)
spec:
serviceAccountName: myaccount
pipelineRef:
name: app-pr-deploy
podTemplate:
nodeSelector:
location: somelocation
params:
- name: branch
value: $(tt.params.target-branch)
** - name: namespace
value: $(tt.params.target-branch)**
- name: commit
value: $(tt.params.commit)
- name: pull-request-number
value: $(tt.params.pull-request-number)
resources:
- name: app-cluster
resourceRef:
name: app-location-cluster
`
The issue is that sometime target-branch is like "integration/feature" and then the namespace is not valid
I would like to check if there is an unvalid character in the value and replace it if there is.
Any way to do it ?
Didn't find any valuable way to do it beside creating a task to execute this via shell script later in the pipeline.
This is something you could do from your EventListener, using something such as:
apiVersion: triggers.tekton.dev/v1alpha1
kind: EventListener
metadata:
name: xx
spec:
triggers:
- name: demo
interceptors:
- name: addvar
ref:
name: cel
params:
- name: overlays
value:
- key: branch_name
expression: "body.ref.split('/')[2]"
bindings:
- ref: your-triggerbinding
template:
ref: your-triggertemplate
Then, from your TriggerTemplate, you would add the "branch_name" param, parsed from your EventListener.
Note: payload from git notification may vary. Sample above valid with github. Translating remote/origin/master into master, or abc/def/ghi/jkl into ghi.
I've created a separate task that is doing all the magic I needed and output a valid namespace name into a different variable.
Then instead of use namespace variable, i use valid-namespace all the way thru the pipeline.
apiVersion: tekton.dev/v1beta1
kind: Task
metadata:
name: validate-namespace-task-v1
spec:
description: >-
This task will validate namespaces
params:
- name: namespace
type: string
default: undefined
results:
- name: valid-namespace
description: this should be a valid namespace
steps:
- name: triage-validate-namespace
image: some-image:0.0.1
script: |
#!/bin/bash
echo -n "$(params.namespace)" | sed "s/[^[:alnum:]-]/-/g" | tr '[:upper:]' '[:lower:]'| tee $(results.valid-namespace.path)
Thanks

How to trigger an existing Argo cronworkflow?

I have tried many versions of this template below
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: tibco-events-sensor
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
serviceAccountName: operate-workflow-sa
dependencies:
- name: tibco-dep
eventSourceName: tibco-events-source
eventName: whatever
triggers:
- template:
name: has-wf-event-trigger
argoWorkflow:
group: argoproj.io
version: v1alpha1
resource: Workflow
operation: resubmit
metadata:
generateName: has-wf-argo-events-
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
name: has-wf-full-refresh
Keep getting errors of workflows not found
"rpc err
or: code = NotFound desc = workflows.argoproj.io \"has-wf-full-refresh\" not found"
I have hundreds of workflows launched as cronworkflows. And i would like to switch them to be event driven vs cron based. Id prefer not to change already existing flows. I just want to submit or resubmit them.
I figured out that the argoWorkflow trigger template doesnt support CronWorkflows. I ended up using the httptrigger template.
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: tibco-events-sensor
spec:
template:
metadata:
annotations:
sidecar.istio.io/inject: 'false'
serviceAccountName: operate-workflow-sa
dependencies:
- name: tibco-dep
eventSourceName: tibco-events-source
eventName: whatever
triggers:
- template:
name: http-trigger
http:
url: http://argo-workflows.argo-workflows:2746/api/v1/workflows/lab-uat/submit
secureHeaders:
- name: Authorization
valueFrom:
secretKeyRef:
name: argo-workflows-sa-token
key: bearer-token
payload:
- src:
dependencyName: tibco-dep
value: CronWorkflow
dest: resourceKind
- src:
dependencyName: tibco-dep
value: coinflip
dest: resourceName
- src:
dependencyName: tibco-dep
value: coinflip-event-
dest: submitOptions.generateName
method: POST
retryStrategy:
steps: 3
duration: 3s
policy:
status:
allow:
- 200

Debugging why Reconcile triggers on Kubernetes Custom Operator

I've a custom operator that listens to changes in a CRD I've defined in a Kubernetes cluster.
Whenever something changed in the defined custom resource, the custom operator would reconcile and idempotently create a secret (that would be owned by the custom resource).
What I expect is for the operator to Reconcile only when something changed in the custom resource or in the secret owned by it.
What I observe is that for some reason the Reconcile function triggers for every CR on the cluster in strange intervals without observable changes to related entities. I've tried focusing on a specific instance of the CR and follow the times in which Reconcile was called for it. The intervals of these calls are very strange. It seems that the calls are alternating between two series - one starts at 10 hours and diminishes seven minutes at a time. The other starts at 7 minutes and grows by 7 minutes a time.
To demonstrate, Reconcile triggered at these times (give or take a few seconds):
00:00
09:53 (10 hours - 1*7 minute interval)
10:00 (0 hours + 1*7 minute interval)
19:46 (10 hours - 2*7 minute interval)
20:00 (0 hours + 2*7 minute interval)
29:39 (10 hours - 3*7 minute interval)
30:00 (0 hours + 3*7 minute interval)
Whenever the diminishing intervals become less than 7 hours, it resets back to 10 hour intervals. The same with the growing series - as soon as the intervals are higher than 3 hours it resets back to 7 minutes.
My main question is how can I investigating why Reconcile is being triggered?
I'm attaching here the manifests for the CRD, the operator and a sample manifest for a CR:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.4.1
creationTimestamp: "2021-10-13T11:04:42Z"
generation: 1
name: databaseservices.operators.talon.one
resourceVersion: "245688703"
uid: 477f8d3e-c19b-43d7-ab59-65198b3c0108
spec:
conversion:
strategy: None
group: operators.talon.one
names:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
scope: Namespaced
versions:
- name: v1alpha1
schema:
openAPIV3Schema:
description: DatabaseService is the Schema for the databaseservices API
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: DatabaseServiceSpec defines the desired state of DatabaseService
properties:
cloud:
type: string
databaseName:
description: Foo is an example field of DatabaseService. Edit databaseservice_types.go
to remove/update
type: string
serviceName:
type: string
servicePlan:
type: string
required:
- cloud
- databaseName
- serviceName
- servicePlan
type: object
status:
description: DatabaseServiceStatus defines the observed state of DatabaseService
type: object
type: object
served: true
storage: true
subresources:
status: {}
status:
acceptedNames:
kind: DatabaseService
listKind: DatabaseServiceList
plural: databaseservices
singular: databaseservice
conditions:
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: no conflicts found
reason: NoConflicts
status: "True"
type: NamesAccepted
- lastTransitionTime: "2021-10-13T11:04:42Z"
message: the initial names have been accepted
reason: InitialNamesAccepted
status: "True"
type: Established
storedVersions:
- v1alpha1
----
apiVersion: operators.talon.one/v1alpha1
kind: DatabaseService
metadata:
creationTimestamp: "2021-10-13T11:14:08Z"
generation: 1
labels:
app: talon
company: amber
repo: talon-service
name: db-service-secret
namespace: amber
resourceVersion: "245692590"
uid: cc369297-6825-4fbf-aa0b-58c24be427b0
spec:
cloud: google-australia-southeast1
databaseName: amber
serviceName: pg-amber
servicePlan: business-4
----
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "75"
secret.reloader.stakater.com/reload: db-credentials
simpledeployer.talon.one/image: <path_to_image>/production:latest
creationTimestamp: "2020-06-22T09:20:06Z"
generation: 77
labels:
simpledeployer.talon.one/enabled: "true"
name: db-operator
namespace: db-operator
resourceVersion: "245688814"
uid: 900424cd-b469-11ea-b661-4201ac100014
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
name: db-operator
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
name: db-operator
spec:
containers:
- command:
- app/db-operator
env:
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: OPERATOR_NAME
value: db-operator
- name: AIVEN_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: db-credentials
- name: AIVEN_PROJECT
valueFrom:
secretKeyRef:
key: projectname
name: db-credentials
- name: AIVEN_USERNAME
valueFrom:
secretKeyRef:
key: username
name: db-credentials
- name: SENTRY_URL
valueFrom:
secretKeyRef:
key: sentry_url
name: db-credentials
- name: ROTATION_INTERVAL
value: monthly
image: <path_to_image>/production#sha256:<some_sha>
imagePullPolicy: Always
name: db-operator
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: db-operator
serviceAccountName: db-operator
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2020-06-22T09:20:06Z"
lastUpdateTime: "2021-09-07T11:56:07Z"
message: ReplicaSet "db-operator-cb6556b76" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2021-09-12T03:56:19Z"
lastUpdateTime: "2021-09-12T03:56:19Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 77
readyReplicas: 1
replicas: 1
updatedReplicas: 1
Note:
When Reconcile finishes, I return:
return ctrl.Result{Requeue: false, RequeueAfter: 0}
So that shouldn't be the reason for the repeated triggers.
I will add that I have recently updated the Kubernetes cluster version to v1.20.8-gke.2101.
This would require more info on how your controller is set up. For example what is the sync period you have set. This could be due to default sync period set which reconciles all the objects at given interval of time.
SyncPeriod determines the minimum frequency at which watched resources are
reconciled. A lower period will correct entropy more quickly, but reduce
responsiveness to change if there are many watched resources. Change this
value only if you know what you are doing. Defaults to 10 hours if unset.
there will a 10 percent jitter between the SyncPeriod of all controllers
so that all controllers will not send list requests simultaneously.
For more information check this: https://github.com/kubernetes-sigs/controller-runtime/blob/v0.11.2/pkg/manager/manager.go#L134
same problem
my Reconcile triggered at these times
00:00
09:03 (9 hours + 3 min)
18:06 (9 hours + 3 min)
00:09 (9 hours + 3 min)
sync period is not set so it should be default.
kubernetes 1.20.11 version

Patching list in kubernetes manifest with Kustomize

I want to patch (overwrite) list in kubernetes manifest with Kustomize.
I am using patchesStrategicMerge method.
When I patch the parameters which are not in list the patching works as expected - only addressed parameters in patch.yaml are replaced, rest is untouched.
When I patch list the whole list is replaced.
How can I replace only specific items in the list and the res of the items in list stay untouched?
I found these two resources:
https://github.com/kubernetes-sigs/kustomize/issues/581
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-api-machinery/strategic-merge-patch.md
but wasn't able to make desired solution of it.
exmaple code:
orig-file.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-slack-config
namespace: system-namespace
spec:
test: test
other: other-stuff
receivers:
- name: default
slackConfigs:
- name: slack
username: test-user
channel: "#alerts"
sendResolved: true
apiURL:
name: slack-webhook-url
key: address
patch.yaml:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-slack-config
namespace: system-namespace
spec:
test: brase-yourself
receivers:
- name: default
slackConfigs:
- name: slack
username: Karl
kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- orig-file.yaml
patchesStrategicMerge:
- patch.yaml
What I get:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-slack-config
namespace: system-namespace
spec:
other: other-stuff
receivers:
- name: default
slackConfigs:
- name: slack
username: Karl
test: brase-yourself
What I want:
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-slack-config
namespace: system-namespace
spec:
other: other-stuff
receivers:
- name: default
slackConfigs:
- name: slack
username: Karl
channel: "#alerts"
sendResolved: true
apiURL:
name: slack-webhook-url
key: address
test: brase-yourself
What you can do is to use jsonpatch instead of patchesStrategicMerge, so in your case:
cat <<EOF >./kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- orig-file.yaml
patches:
- path: patch.yaml
target:
group: monitoring.coreos.com
version: v1alpha1
kind: AlertmanagerConfig
name: alertmanager-slack-config
EOF
patch:
cat <<EOF >./patch.yaml
- op: replace
path: /spec/receivers/0/slackConfigs/0/username
value: Karl
EOF

I wan to modify the http response code and the body from Istio ingress

I have currently written below auth manifest for Istio.
kind: RequestAuthentication
metadata:
name: "jwt-validation"
namespace: some-namespace
spec:
selector:
matchLabels:
auth: required
jwtRules:
- issuer: "https://you.auth0.com/"
jwksUri: "https://you.auth0.com/.well-known/jwks.json"
---
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: jwt-auth-policy
namespace: some-namespace
spec:
selector:
matchLabels:
auth: required
action: DENY
rules:
- from:
- source:
notRequestPrincipals: ["*"]
for which i am getting the below response from browser
RBAC: access denied
But instead of this i wan to get a Json response
saying
{
"status": "failure",
"message": "Not Authorised"
}
with status code 403
Now i have tried the below Lua filter
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: custom-filter-response-code
namespace: istio-system
spec:
workloadSelector:
labels:
istio: ingressgateway
configPatches:
- applyTo: HTTP_FILTER
match:
context: GATEWAY
listener:
filterChain:
filter:
name: "envoy.filters.network.http_connection_manager"
subFilter:
name: "envoy.extAuthz"
patch:
operation: INSERT_AFTER
value:
name: envoy.custom-resp
typed_config:
"#type": "type.googleapis.com/envoy.extensions.filters.http.lua.v3.Lua"
inlineCode: |
function envoy_on_response(response_handle)
if response_handle:headers():get(":status") == "401" then
response_handle:headers():replace(":status", "403")
else
local body = response_handle:body()
local jsonString = tostring(body:getBytes(0, body:length()))
jsonString = jsonString:gsub("(status|failur)", "(message|Not Authorised)")
response_handle:body():set(jsonString)
end
Please Guide me with correct snippet