Filebeat with custom logfile and kubernetes metadata to elasticsearch and kibana - kubernetes

Our applications are deployed in AWS EKS cluster, and for certain reasons we need to write our app logs to separate file lets say ${POD_NAME}.applog instead of stdout (we mounted /var/log/container/ to the pod /log folder and app writes /log/${POD_NAME}.applog ). And we are using filebeat to send the logs to Elasticsearch and we are using Kibana for visualization. Our filebeat config file looks like this
data:
filebeat.yml: |-
filebeat.inputs:
- type: log
paths:
- /var/log/containers/*.applog
json.keys_under_root: true
json.message_key: log
processors:
- add_cloud_metadata:
- add_host_metadata:
This is working fine, but we realised we are missing the kuberenetes metadata in ES and Kibana. But we are getting kuberenetes metadata when we include -type: conatainer.
data:
filebeat.yml: |-
filebeat.inputs:
- type: log
paths:
- /var/log/containers/*.applog
json.keys_under_root: true
json.message_key: log
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
So we tried adding the config like this
data:
filebeat.yml: |-
filebeat.inputs:
- type: log
paths:
- /var/log/containers/*.applog
json.keys_under_root: true
json.message_key: log
processors:
- add_kubernetes_metadata:
in_cluster: true
host: ${NODE_NAME}
- add_cloud_metadata:
- add_host_metadata:
Still we are not getting the kuberenetes metadata in kibana. I tried with all trial and error method, but nothing works.
Can someone please help me how to get Kubernetes metadata with custom logfile in filebeat.

We meet the same issue under EKS 1.24, there is no kubernetes metadata in the log entry. Per doc Add Kubernetes metadataedit, we solve it through the following config under filebeat version 7.17.8.
- type: container
paths:
- /var/log/pods/*/*/*.log
exclude_files: ['filebeat.*',
'logstash.*',
'kube.*',
'cert-manager.*']
processors:
- add_kubernetes_metadata:
in_cluster: true
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: '/var/log/pods/'
default_indexers.enabled: false
default_matchers.enabled: false
indexers:
- pod_uid:
matchers:
- logs_path:
logs_path: '/var/log/pods/'
resource_type: 'pod'
The default indexer and matcher should be disabled. And the pod_uid indexers could identify the pod metadata using the UID of the pod. You could find the logs_path matchers config from here

Related

Is EFS a good logs backup option if Loki pod terminated accidentally in EKS Fargate

I am currently using Loki to store logs generated by my applications from EKS Fargate. Sidecar pattern with promtail is used to scrape logs. Single Loki pod is used and S3 is configured as a destination to store logs. It works nicely as expected. However, when I tested the availability of the logging system by deleting pods, I discovered that if Loki’s pod was deleted, some logs would be missing (range around 20 mins before the pod was deleted to the time the pod was deleted) even after the pod restarted.
To solve this problem, I tried to use EFS as the persistent volume of Loki’ pod, mounting the path /loki. The whole process is followed by this article (https://aws.amazon.com/blogs/aws/new-aws-fargate-for-amazon-eks-now-supports-amazon-efs/). But I have got an error from the Loki pod with msg "error running loki" err="mkdir /loki/compactor: permission denied”
Therefore, I have 2 questions in my mind:
Should I use EFS as a solution for log backup in my case?
Why did I get a permission denied inside the pod, any ways to solve this problem?
My Loki-config.yaml
auth_enabled: false
server:
http_listen_port: 3100
# grpc_listen_port: 9096
ingester:
wal:
enabled: true
dir: /loki/wal
lifecycler:
ring:
kvstore:
store: inmemory
replication_factor: 1
# final_sleep: 0s
chunk_idle_period: 3m
chunk_retain_period: 30s
max_transfer_retries: 0
chunk_target_size: 1048576
schema_config:
configs:
- from: 2020-05-15
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/index
cache_location: /loki/index_cache
shared_store: s3
aws:
bucketnames: bucketnames
endpoint: s3.us-west-2.amazonaws.com
region: us-west-2
access_key_id: access_key_id
secret_access_key: secret_access_key
sse_encryption: true
compactor:
working_directory: /loki/compactor
shared_store: s3
compaction_interval: 5m
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 48h
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: true
retention_period: 96h
querier:
query_ingesters_within: 0
analytics:
reporting_enabled: false
Deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: fargate-api-dev
name: dev-loki
spec:
selector:
matchLabels:
app: dev-loki
template:
metadata:
labels:
app: dev-loki
spec:
volumes:
- name: loki-config
configMap:
name: dev-loki-config
- name: dev-loki-efs-pv
persistentVolumeClaim:
claimName: dev-loki-efs-pvc
containers:
- name: loki
image: loki:2.6.1
args:
- -print-config-stderr=true
- -config.file=/tmp/loki.yaml
resources:
limits:
memory: "500Mi"
cpu: "200m"
ports:
- containerPort: 3100
volumeMounts:
- name: dev-loki-config
mountPath: /tmp
readOnly: false
- name: dev-loki-efs-pv
mountPath: /loki
Promtail-config.yaml
server:
log_level: info
http_listen_port: 9080
clients:
- url: http://loki.com/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
- job_name: api-log
static_configs:
- targets:
- localhost
labels:
job: apilogs
pod: ${POD_NAME}
__path__: /var/log/*.log
I had a similar issue using EFS as volume to store the logs and I found this solution https://github.com/grafana/loki/issues/2018#issuecomment-1030221498
Basically loki container by it's own is not able to create a directory to start working, so we used a initcotainer to do it for it.
This solution worked like a charm for.

How configure properly Grafana Tempo?

I tried to use Grafana Tempo for distributed tracing.
I launch it from docker-compose:
version: "3.9"
services:
# MY MICROSERVICES
...
prometheus:
image: prom/prometheus
ports:
- ${PROMETHEUS_EXTERNAL_PORT}:9090
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:cached
promtail:
image: grafana/promtail
volumes:
- ./log:/var/log
- ./promtail/:/mnt/config
command: -config.file=/mnt/config/promtail-config.yaml
loki:
image: grafana/loki
command: -config.file=/etc/loki/local-config.yaml
tempo:
image: grafana/tempo
command: [ "-config.file=/etc/tempo.yaml" ]
volumes:
- ./tempo/tempo-local.yaml:/etc/tempo.yaml
# - ./tempo-data/:/tmp/tempo
ports:
- "14268" # jaeger ingest
- "3200" # tempo
- "55680" # otlp grpc
- "55681" # otlp http
- "9411" # zipkin
tempo-query:
image: grafana/tempo-query
command: [ "--grpc-storage-plugin.configuration-file=/etc/tempo-query.yaml" ]
volumes:
- ./tempo/tempo-query.yaml:/etc/tempo-query.yaml
ports:
- "16686:16686" # jaeger-ui
depends_on:
- tempo
grafana:
image: grafana/grafana
volumes:
- ./grafana/datasource-config/:/etc/grafana/provisioning/datasources:cached
- ./grafana/dashboards/prometheus.json:/var/lib/grafana/dashboards/prometheus.json:cached
- ./grafana/dashboards/loki.json:/var/lib/grafana/dashboards/loki.json:cached
- ./grafana/dashboards-config/:/etc/grafana/provisioning/dashboards:cached
ports:
- ${GRAFANA_EXTERNAL_PORT}:3000
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
depends_on:
- prometheus
- loki
with tempo-local.yaml:
server:
http_listen_port: 3200
distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
thrift_compact:
zipkin:
otlp:
protocols:
http:
grpc:
opencensus:
ingester:
trace_idle_period: 10s # the length of time after a trace has not received spans to consider it complete and flush it
max_block_bytes: 1_000_000 # cut the head block when it hits this size or ...
max_block_duration: 5m # this much time passes
compactor:
compaction:
compaction_window: 1h # blocks in this time window will be compacted together
max_block_bytes: 100_000_000 # maximum size of compacted blocks
block_retention: 1h
compacted_block_retention: 10m
storage:
trace:
backend: local # backend configuration to use
block:
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample_bytes: 1000 # number of bytes per index record
encoding: zstd # block encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
wal:
path: /tmp/tempo/wal # where to store the the wal locally
encoding: snappy # wal encoding/compression. options: none, gzip, lz4-64k, lz4-256k, lz4-1M, lz4, snappy, zstd, s2
local:
path: /tmp/tempo/blocks
pool:
max_workers: 100 # worker pool determines the number of parallel requests to the object store backend
queue_depth: 10000
tempo-query.yaml:
backend: "tempo:3200"
and datasource.yml for instrumenting datasources on grafana:
apiVersion: 1
deleteDatasources:
- name: Prometheus
- name: Tempo
- name: Loki
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: false
version: 1
editable: false
- name: Tempo
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo
- name: Tempo-Multitenant
type: tempo
access: proxy
orgId: 1
url: http://tempo-query:16686
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
uid: tempo-authed
jsonData:
httpHeaderName1: 'Authorization'
secureJsonData:
httpHeaderValue1: 'Bearer foo-bar-baz'
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: false
version: 1
editable: false
apiVersion: 1
jsonData:
derivedFields:
- datasourceUid: tempo
matcherRegex: \[.+,(.+),.+\]
name: TraceID
url: $${__value.raw}
But if I test the datasource in grafana, I have this error:
In Loki view I can found the Tempo button for seeing traces... but I can't see it on Tempo because I have an error:
Anyway if I take the trace id and I search it on Jaeger, I can see it correctly.
What I missing in configuration for Tempo? How configure it correctly?
Grafana 7.5 and later can talk to Tempo natively, and no longer need the tempo-query proxy. I think this explains what is happening, Grafana is attempting to use the Tempo-native API against tempo-query, which exposes the Jaeger API instead. Try changing the Grafana datasource in datasource.yml to http://tempo:3200.
This solution applies to the tempo installation via the kubernetes helm charts (so it applies to the question's title but not the exact question):
Use the url http://helmreleasename-tempo:3100 for configuring the tempo datasource in grafana. You need to check your tempo service name in kubernetes and use the port 3100.

Filebeat on Kubernetes modules are not working

I am using this guide to run filebeat on a Kubernetes cluster.
https://www.elastic.co/guide/en/beats/filebeat/master/running-on-kubernetes.html#_kubernetes_deploy_manifests
filebeat version: 6.6.0
I updated config file with:
filebeat.yml: |-
filebeat.config:
inputs:
# Mounted `filebeat-inputs` configmap:
path: ${path.config}/inputs.d/*.yml
# Reload inputs configs as they change:
reload.enabled: false
modules:
path: ${path.config}/modules.d/*.yml
# Reload module configs as they change:
reload.enabled: false
# To enable hints based autodiscover, remove `filebeat.config.inputs` configuration and uncomment this:
#filebeat.autodiscover:
# providers:
# - type: kubernetes
# hints.enabled: true
filebeat.modules:
- module: nginx
access:
enabled: true
var.paths: ["/var/log/nginx/access.log*"]
- module: apache2
access:
enabled: true
var.paths: ["/var/log/apache2/access.log*"]
error:
enabled: true
var.paths: ["/var/log/apache2/error.log*"]
But, the logs from the PHP application (/var/log/apache2/error.log) are not being fetched by filebeat. I checked by execing into the filebeat pod and I see that apache2 and nginx modules are not enabled.
How can I set it up correctly in above yaml file.
UPDATE
I updated filebeat config file with below settings:
filebeat.autodiscover:
providers:
- type: kubernetes
hints.enabled: true
templates:
- condition:
config:
- type: docker
containers.ids:
- "${data.kubernetes.container.id}"
exclude_lines: ["^\\s+[\\-`('.|_]"] # drop asciiart lines
- condition:
equals:
kubernetes.labels.app: "my-apache-app"
config:
- module: apache2
log:
input:
type: docker
containers.ids:
- "${data.kubernetes.container.id}"
apiVersion: v1
kind: ConfigMap
metadata:
name: filebeat-modules
namespace: default
labels:
k8s-app: filebeat
data:
apache2.yml: |-
- module: apache2
access:
enabled: true
error:
enabled: true
nginx.yml: |-
- module: nginx
access:
enabled: true
Now, I am logging apache errors in /dev/stderr so that I can see it thru kubectl logs. Logs are fetching over kibana dashboard. But, apache module is still noe visible.
I tried checking with ./filebeat modules list:
Enabled:
apache2
nginx
Disabled:
Kibana Dashboard

k8s scdf2 how config volumenMount in a task (no freetext)

Deploying a task, as user, i need config k8s params like i do using "freetext".
The k8s config is following
Secret: "kind": "Secret","apiVersion": "v1","metadata": {"name": "omni-secret","namespace": "default",
bootstrap.yml:
spring:
application:
name: mk-adobe-analytics-task
cloud:
kubernetes:
config:
enabled: false
secrets:
enabled: true
namespace: default
paths:
- /etc/secret-volume
log.info(AdobeAnalyticsConstants.LOG_RECOVERING_SECRET, env.getProperty("aws.bucketname"));
Deploying task:
task launch test-007 --properties "deployer.*.kubernetes.volumeMounts=[{name: secret-volume, mountPath: '/etc/secret-volume'}], deployer.* .kubernetes.volumes=[{name: 'secret-volume', secret: {secretName: 'omni-secret' }}]"
Result:
2019-06-10 10:32:50.852 INFO 1 --- Recovering property "aws.bucketname": null
How can i map into a task the k8s volumens? simply k8s deploy , it is ok using streams
it's not clear how to start with your issue but please take a look for Kubernetes PropertySource implementations.
Inside "Secrets PropertySource - Table 3.2. Properties" you can find other settings like:
- spring.cloud.kubernetes.secrets.name
- spring.cloud.kubernetes.secrets.labels
- spring.cloud.kubernetes.secrets.enableApi
So please refer to the documentation.
It's also possible that your environment variable aws.bucketname wasn't configured properly.
Hope this help.

How to configure prometheus with alertmanager?

docker-compose.yml:
This is the docker-compose to run the prometheus, node-exporter and alert-manager service. All the services are running great. Even the health status in target menu of prometheus shows ok.
version: '2'
services:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanger/alert.rules:/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
node-exporter:
image: prom/node-exporter
ports:
- '9100:9100'
alertmanager:
image: prom/alertmanager
privileged: true
volumes:
- ./alertmanager/alertmanager.yml:/alertmanager.yml
command:
- '--config.file=/alertmanager.yml'
ports:
- '9093:9093'
prometheus.yml
This is the prometheus config file with targets and alerts target sets. The alertmanager target url is working fine.
global:
scrape_interval: 5s
external_labels:
monitor: 'my-monitor'
# this is where I have simple alert rules
rule_files:
- ./alertmanager/alert.rules
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
alerting:
alertmanagers:
- static_configs:
- targets: ['some-ip:9093']
alert.rules:
Just a simple alert rules to show alert when service is down
ALERT service_down
IF up == 0
alertmanager.yml
This is to send the message on slack when alerting occurs.
global:
slack_api_url: 'https://api.slack.com/apps/A90S3Q753'
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- send_resolved: true
username: 'tara gurung'
channel: '#general'
api_url: 'https://hooks.slack.com/services/T52GRFN3F/B90NMV1U2/QKj1pZu3ZVY0QONyI5sfsdf'
Problems:
All the containers are working fine I am not able to figure out the exact problem.What am I really missing. Checking the alerts in prometheus shows.
Alerts
No alerting rules defined
Your ./alertmanager/alert.rules file is not included in your docker config, so it is not available in the container. You need to add it to the prometheus service:
prometheus:
image: prom/prometheus
privileged: true
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alertmanager/alert.rules:/alertmanager/alert.rules
command:
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- '9090:9090'
And probably give an absolute path inside prometheus.yml:
rule_files:
- "/alertmanager/alert.rules"
You also need to make sure you alerting rules are valid. Please see the prometheus docs for details and examples. You alert.rules file should look something like this:
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 5m
Once you have multiple files, it may be better to add the entire directory as a volume rather than individual files.
If you need answers to this question see the explanation on this link
How to make alert rules visible on Prometheus User Interface?
Your alert rules inside the prometheus.yml should look like this
rule_files:
- "/etc/prometheus/alert.rules.yml"
You need to stop the alertmanager and prometheus containers and run this
docker run -d --name prometheus_ops -p 9191:9090 -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml -v $(pwd)/alert.rules.yml:/etc/prometheus/alert.rules.yml prom/prometheus
Verify if you can see the alert.rule config path : Prometheus container ID and go to cd /etc/prometheus
docker exec -it fa99f733f69b sh