We have configured Grok exporter to monitor errors from various system logs. But it seems changes are reflected once we restart the respective grok instance.
Please see the config.yml below:
global:
config_version: 2
input:
type: file
path: /ZAMBAS/logs/Healthcheck/EFT/eftcl.log
readall: true
poll_interval_seconds: 5
grok:
patterns_dir: ./patterns
metrics:
- type: gauge
name: EFTFileTransfers
help: Counter metric example with labels.
match: '%{WORD:Status}\s%{GREEDYDATA:FileTransferTime};\s\\%{WORD:Customer}\\%{WORD:OutboundSystem}\\%{GREEDYDATA:File};\s%{WORD:Operation};\s%{NUMBER:Code}'
value: '{{.Code}}'
cumulative: false
labels:
Customer: '{{.Customer}}'
OutboundSystem: '{{.OutboundSystem}}'
File: '{{.File}}'
Status: '{{.Status}}'
Operation: '{{.Operation}}'
FileTransferTime: '{{.FileTransferTime}}'
- type: gauge
name: EFTFileSuccessfullTransfers
help: Counter metric example with labels.
match: 'Success\s%{GREEDYDATA:Time};\s\\%{WORD:Customer}\\%{WORD:OutboundSystem}\\%{GREEDYDATA:File};\s%{WORD:Operation};\s%{NUMBER:Code}'
value: '{{.Code}}'
cumulative: false
- type: gauge
name: EFTFileFailedTransfers
help: Counter metric example with labels.
match: 'Failed\s%{GREEDYDATA:Time};\s\\%{WORD:Customer}\\%{WORD:OutboundSystem}\\%{GREEDYDATA:File};\s%{WORD:Operation};\s%{NUMBER:Code}'
value: '{{.Code}}'
cumulative: false
server:
port: 9845
Without restart it doesn't reflects correct matching patterns. Once I restart the grok instance it reflects perfectly.
Is there some parameter I am missing here ?
Thanks
Priyotosh
Simply change readall to false in input section will stop process lines multiple times when grok_exporter is restarted. Please see the docs on Github.
Related
I am a beginner who is using Prometheus and Grapana to monitor the value of REST API.
Prometheus, json-exporrter, and grafana both used the Helm chart, Prometheus installed as default values.yaml, and json-exporter installed as custom values.yaml.
I checked that the prometheus set the service monitor of json-exporter as a target, but I couldn't check its metrics.
How can I check the metrics? Below is the environment , screenshots and code.
environment :
kubernetes : v1.22.9
helm : v3.9.2
prometheus-json-exporter helm chart : v0.5.0
kube-prometheus-stack helm chart : 0.58.0
screenshots :
https://drive.google.com/drive/folders/1vfjbidNpE2_yXfxdX8oX5eWh4-wAx7Ql?usp=sharing
values.yaml
in custom_jsonexporter_values.yaml
# Default values for prometheus-json-exporter.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
image:
repository: quay.io/prometheuscommunity/json-exporter
pullPolicy: IfNotPresent
# Overrides the image tag whose default is the chart appVersion.
tag: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
# Specifies whether a service account should be created
create: true
# Annotations to add to the service account
annotations: []
# The name of the service account to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
podAnnotations: []
podSecurityContext: {}
# fsGroup: 2000
# podLabels:
# Custom labels for the pod
securityContext: {}
# capabilities:
# drop:
# - ALL
# readOnlyRootFilesystem: true
# runAsNonRoot: true
# runAsUser: 1000
service:
type: ClusterIP
port: 7979
targetPort: http
name: http
serviceMonitor:
## If true, a ServiceMonitor CRD is created for a prometheus operator
## https://github.com/coreos/prometheus-operator
##
enabled: true
namespace: monitoring
scheme: http
# Default values that will be used for all ServiceMonitors created by `targets`
defaults:
additionalMetricsRelabels: {}
interval: 60s
labels:
release: prometheus
scrapeTimeout: 60s
targets:
- name : pi2
url: http://xxx.xxx.xxx.xxx:xxxx
labels: {} # Map of labels for ServiceMonitor. Overrides value set in `defaults`
interval: 60s # Scraping interval. Overrides value set in `defaults`
scrapeTimeout: 60s # Scrape timeout. Overrides value set in `defaults`
additionalMetricsRelabels: {} # Map of metric labels and values to add
ingress:
enabled: false
className: ""
annotations: []
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: chart-example.local
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 100
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
nodeSelector: []
tolerations: []
affinity: []
configuration:
config: |
---
modules:
default:
metrics:
- name: used_storage_byte
path: '{ .used }'
help: used storage byte
values:
used : '{ .used }'
labels: {}
- name: free_storage_byte
path: '{ .free }'
help: free storage byte
labels: {}
values :
free : '{ .free }'
- name: total_storage_byte
path: '{ .total }'
help: total storage byte
labels: {}
values :
total : '{ .total }'
prometheusRule:
enabled: false
additionalLabels: {}
namespace: ""
rules: []
additionalVolumes: []
# - name: password-file
# secret:
# secretName: secret-name
additionalVolumeMounts: []
# - name: password-file
# mountPath: "/tmp/mysecret.txt"
# subPath: mysecret.txt
Firstly you can check the targets page on the Prometheus UI to see if a) your desired target is even defined and b) if the endpoint is reachable and being scraped.
However, you may need to troubleshoot a little if either of the above is not the case:
It is important to understand what is happening. You have deployed a Prometheus Operator to the cluster. If you have used the default values from the helm chart, you also deployed a Prometheus custom resource(CR). This instance is what is telling the Prometheus Operator how to ultimately configure the Prometheus running inside the pod. Certain things are static, like global metric relabeling for example, but most are dynamic, such as picking up new targets to actually scrape. Inside the Prometheus CR you will find options to specify serviceMonitorSelector and serviceMonitorNamespaceSelector (The behaviour is the same also for probes and podmonitors so I'm just going over it once). Assuming you leave the default set like serviceMonitorNamespaceSelector: {}, Prometheus Operator will look for ServiceMonitors in all namespaces on the cluster to which it has access via its serviceAccount. The serviceMonitorSelector field lets you specify a label and value combination that must be present on a serviceMonitor that must be present for it to be picked up. Once a or multiple serviceMonitors are found, that match the criteria in the selectors, Prometheus Operator adjusts the configuration in the actual Prometheus instance(tl;dr version) so you end up with proper scrape targets.
Step 1 for trouble shooting: Do your selectors match the labels and namespace of the serviceMonitors? Actually check those. The default on the prometheus operator helm chart expects a label release: prometheus-operator and in your config, you don't seem to add that to your json-exporter's serviceMonitor.
Step 2: The same behaviour as outline for how serviceMonitors are picked up, is happening in turn inside the serviceMonitor itself, so make sure that your service actually matches what is specced out in the serviceMonitor.
To deep dive further into the options you have and what the fields do, check the API documentation.
Having setup Kibana and a fleet server, I now have attempted to add APM.
When going through the general setup - I forever get an error no matter what is done:
failed to listen:listen tcp *.*.*.*:8200: bind: can't assign requested address
This is when following the steps for setup of APM having created the fleet server.
This is all being launched in Kubernetes and the documentation has been gone through several times to no avail.
We did discover that we can hit the
/intake/v2/events
etc endpoints when shelled into the container but 404 for everything else. Its close but no cigar so far following the instructions.
As it turned out, the general walk through is soon to be depreciated in its current form as is.
And setup is far far simpler in a helm file where its actually possible to configure kibana with package ref for your named apm service.
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
- name: apm
version: latest
xpack.fleet.agentPolicies:
- name: Fleet Server on ECK policy
id: eck-fleet-server
is_default_fleet_server: true
namespace: default
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: default
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
is_default: true
package_policies:
- name: system-1
id: system-1
package:
name: system
- package:
name: apm
name: apm-1
inputs:
- type: apm
enabled: true
vars:
- name: host
value: 0.0.0.0:8200
Making sure these are set in the kibana helm file will allow any spun up fleet server to automatically register as having APM.
The missing key in seemingly all the documentation is the need of a APM service.
The simplest example of which is here:
Example yaml scripts
I am trying to create an alarm for a sagemaker endpoint using cloudformation. My endpoint has two variants. My cloud formation file looks similar to below:
MySagemakerAlarmCPUUtilization:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmName: MySagemakerAlarmCPUUtilization
AlarmDescription: Monitor the CPU levels of the endpoint
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Dimension:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
Namespace: AWS/SageMaker/Endpoints
EvaluationPeriods: 1
Period: 600
Statistic: Average
Threshold: 50
I am having an issue though with the dimension part. I get an invalid property error here. Does anyone know the correct syntax to look at a particular variant of an endpoint in cloud formation?
Realised I just had a typo in this. It should read Dimensions. So:
Dimensions:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
But the code is right if anyone else wanted to use it
I've been making some tests with a Kubernetes cluster and I installed the loki-promtail stack by means of the helm loki/loki-stack chart.
The default configuration works fine, but now I would like to add some custom behaviour to the standard promtail config.
According to the Promtail documentation I tried to customise the values.xml in this way:
promtail:
extraScrapeConfigs:
- job_name: dlq-reader
kubernetes_sd_configs:
- role: pod
pipeline_stages:
- template:
source: new_key
template: 'test'
- output:
source: new_key
The expected behaviour is that every log line is replaced by the static text "test" (of course this is a silly test just to get familiar with this environment).
What I see is that this configuration is correctly applied to the loki config-map but without any effect: the log lines looks exactly as if this additional configuration wasn't there.
The loki-stack chart version is 0.39.0 which installs loki 1.5.0.
I cannot see any error in the loki/promtails logs... Any suggestion?
I finally discovered the issue then I post what I found in case this might help anyone else with the same issue.
In order to modify the log text or to add custom labels, the correct values.yaml section to provide is pipelineStages instead of extraScrapeConfigs. Then, the previous snippet must be changed in the following way:
promtail:
pipelineStages:
- docker: {}
- match:
selector: '{container="dlq-reader"}'
stages:
- template:
source: new_key
template: 'test'
- output:
source: new_key
I am trying to create a yaml file to deploy gke cluster in a custom network I created. I get an error
JSON payload received. Unknown name \"network\": Cannot find field."
I have tried a few names for the resources but I am still seeing the same issue
resources:
- name: myclus
type: container.v1.cluster
properties:
network: projects/project-251012/global/networks/dev-cloud
zone: "us-east4-a"
cluster:
initialClusterVersion: "1.12.9-gke.13"
currentMasterVersion: "1.12.9-gke.13"
## Initial NodePool config.
nodePools:
- name: "myclus-pool1"
initialNodeCount: 3
version: "1.12.9-gke.13"
config:
machineType: "n1-standard-1"
oauthScopes:
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/ndev.clouddns.readwrite
preemptible: true
## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
- name: myclus-pool1
type: container.v1.nodePool
properties:
zone: us-east4-a
clusterId: $(ref.myclus.name)
nodePool:
name: "myclus-pool1"
I expect it to place the cluster nodes in this network.
The network field needs to be part of the cluster spec. The top-level of properties should just be zone and cluster, network should be on the same indentation as initialClusterVersion. See more on the container.v1.cluster API reference page
Your manifest should look more like:
EDIT: there is some confusion in the API reference docs concerning deprecated fields. I offered a YAML that applies to the new API, not the one you are using. I've update with the correct syntax for the basic v1 API and further down I've added the newer API (which currently relies on gcp-types to deploy.
resources:
- name: myclus
type: container.v1.cluster
properties:
projectId: [project]
zone: us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: [network_name]
subnetwork: [subnet] ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-1
type: container.v1.nodePool
properties:
projectId: [project]
zone: us-central1-f
clusterId: $(ref.myclus.name)
nodePool:
name: my-clus-pool2
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
The newer API (which provides more functionality and allows you to use more features including the v1beta1 API and beta features) would look something like this:
resources:
- name: myclus
type: gcp-types/container-v1:projects.locations.clusters
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: shared-vpc
subnetwork: local-only ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-2
type: gcp-types/container-v1:projects.locations.clusters.nodePools
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f/clusters/$(ref.myclus.name)
nodePool:
name: my-clus-separate-pool
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
Another note, you may want to modify your scopes, the current scopes will not allow you to pull images from gcr.io, some system pods may not spin up properly and if you are using Google's repository, you will be unable to pull those images.
Finally, you don't want to repeat the node pool resource in both the cluster spec and separately. Instead, create the cluster with a basic (default) node pool, for all additional node pools, create them as separate resources to manage them without going through the cluster. There are very few updates you can perform on a node pool, asside from resizing