Ngnix ingress controller Modsecurity (OWASP Ruleset) high latency response

Ngnix ingress controller Modsecurity (OWASP Ruleset) high latency response - kubernetes

In our AWS EKS environment, I deployed Nginx ingress controller through helm, following the official Nginx install guide and adding a configmap yaml that enables Waf modsecurity in this ingress with OWASP v3.3.0 ruleset. It stands behind aws nlb
It seems that the petitions we are madding now to the environment are being processing with a high latency, but this happen when you make the first petition from same IP, after that, the following ones are working good.
nginx-values.yaml
---
controller:
config:
use-proxy-protocol: true
enable-modsecurity: true
ssl-protocols: "TLSv1.2 TLSv1.3"
ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384"
service:
enableHttps: true
enableHttp: false
type: LoadBalancer
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: preserve_client_ip.enabled=true
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
service.beta.kubernetes.io/aws-load-balancer-healthcheck-interval: 10
service.beta.kubernetes.io/aws-load-balancer-healthcheck-timeout: 3
service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: 2
service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: 2
service.beta.kubernetes.io/load-balancer-source-ranges: ${source_range}
metrics:
enabled: true
extraInitContainers:
- name: init
image: alpine:3
command: ["/bin/sh","-c"]
args: ["ls -tla /opt/modsecurity/var; chown -R 101:101 /opt/modsecurity/var; ls -tla /opt/modsecurity/var; touch /opt/modsecurity/var/log/debug.log; chown -R 101:101 /opt/modsecurity/var"]
volumeMounts:
- name: log
mountPath: /opt/modsecurity/var/log
securityContext:
runAsGroup: 0
runAsNonRoot: false
runAsUser: 0
privileged: true
extraContainers:
- name: promtail
image: grafana/promtail
args:
- -config.file=/etc/config-waf/promtail.yaml
volumeMounts:
- name: config-map
mountPath: /etc/config-waf
- name: log
mountPath: /opt/modsecurity/var/log
resources:
limits:
cpu: 100m
memory: 256Mi
requests:
cpu: 100m
memory: 256Mi
extraVolumeMounts:
- name: config-map
mountPath: /etc/nginx/modsecurity
- name: log
mountPath: /opt/modsecurity/var/log
extraVolumes:
- name: config-map
configMap:
name: waf-config
- name: log
emptyDir: {}
- name: audit
emptyDir: {}
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
autoscalingTemplate:
- type: Pods
pods:
metric:
name: nginx_ingress_controller_nginx_process_requests_total
target:
type: AverageValue
averageValue: 10000m
defaultBackend:
enabled: true
waf.conf
SecRuleEngine On
SecRequestBodyAccess On
SecRule REQUEST_HEADERS:Content-Type "(?:application(?:/soap\+|/)|text/)xml" \
"id:'200000',phase:1,t:none,t:lowercase,pass,nolog,ctl:requestBodyProcessor=XML"
SecRule REQUEST_HEADERS:Content-Type "application/json" \
"id:'200001',phase:1,t:none,t:lowercase,pass,nolog,ctl:requestBodyProcessor=JSON"
SecRequestBodyLimit 13107200
SecRequestBodyNoFilesLimit 131072
SecRequestBodyLimitAction Reject
SecRule REQBODY_ERROR "!#eq 0" \
"id:'200002', phase:2,t:none,log,deny,status:400,msg:'Failed to parse request body.',logdata:'%{reqbody_error_msg}',severity:2"
SecRule MULTIPART_STRICT_ERROR "!#eq 0" \
"id:'200003',phase:2,t:none,log,deny,status:400, \
msg:'Multipart request body failed strict validation: \
PE %{REQBODY_PROCESSOR_ERROR}, \
BQ %{MULTIPART_BOUNDARY_QUOTED}, \
BW %{MULTIPART_BOUNDARY_WHITESPACE}, \
DB %{MULTIPART_DATA_BEFORE}, \
DA %{MULTIPART_DATA_AFTER}, \
HF %{MULTIPART_HEADER_FOLDING}, \
LF %{MULTIPART_LF_LINE}, \
SM %{MULTIPART_MISSING_SEMICOLON}, \
IQ %{MULTIPART_INVALID_QUOTING}, \
IP %{MULTIPART_INVALID_PART}, \
IH %{MULTIPART_INVALID_HEADER_FOLDING}, \
FL %{MULTIPART_FILE_LIMIT_EXCEEDED}'"
SecRule MULTIPART_UNMATCHED_BOUNDARY "#eq 1" \
"id:'200004',phase:2,t:none,log,deny,msg:'Multipart parser detected a possible unmatched boundary.'"
SecPcreMatchLimit 1000
SecPcreMatchLimitRecursion 1000
SecRule TX:/^MSC_/ "!#streq 0" \
"id:'200005',phase:2,t:none,deny,msg:'ModSecurity internal error flagged: %{MATCHED_VAR_NAME}'"
SecResponseBodyAccess On
SecResponseBodyMimeType text/plain text/html text/xml
SecResponseBodyLimit 524288
SecResponseBodyLimitAction ProcessPartial
SecTmpDir /tmp/
SecDataDir /tmp/
SecDebugLog /opt/modsecurity/var/log/debug.log
SecDebugLogLevel 3
SecAuditEngine Off
SecAuditLogRelevantStatus "^(?:5|4(?!04))"
SecAuditLogParts ABIJDEFHZ
SecAuditLogType Serial
SecAuditLog /opt/modsecurity/var/audit/modsec_audit.log
SecArgumentSeparator &
SecCookieFormat 0
SecUnicodeMapFile unicode.mapping 20127
SecStatusEngine On
crs-setup.conf
SecDefaultAction "phase:1,log,auditlog,pass,status:408"
SecDefaultAction "phase:2,log,auditlog,pass,status:408"
SecAction \
"id:900000,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.paranoia_level=1"
SecAction \
"id:900100,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.critical_anomaly_score=5,\
setvar:tx.error_anomaly_score=4,\
setvar:tx.warning_anomaly_score=3,\
setvar:tx.notice_anomaly_score=2"
SecAction \
"id:900110,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.inbound_anomaly_score_threshold=10000,\
setvar:tx.outbound_anomaly_score_threshold=10000"
SecAction \
"id:900700,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:'tx.dos_burst_time_slice=30',\
setvar:'tx.dos_counter_threshold=250',\
setvar:'tx.dos_block_timeout=300'"
SecAction \
"id:900960,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.do_reput_block=1"
SecAction \
"id:900970,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.reput_block_duration=300"
SecCollectionTimeout 600
SecAction \
"id:900990,\
phase:1,\
nolog,\
pass,\
t:none,\
setvar:tx.crs_setup_version=330"
Any thoughts on this?

What do you mean by 'high latency'? Is it affecting all requests or only specific ones? Have you tried disabling DoS protection in crs-setup.conf?

Related

Getting error in Kubernetes cronjob while using google cloud sdk to upload data on GCS bucket

**Yaml for kubernetes that is first used to create raft backup and then upload into gas bucket**
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
app.kubernetes.io/component: raft-backup
numenapp: raft-backup
name: raft-backup
namespace: raft-backup
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
metadata:
annotations:
vault.security.banzaicloud.io/vault-addr: https://vault.vault-internal.net:8200
labels:
app.kubernetes.io/component: raft-backup
spec:
containers:
- args:
- |
SA_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token);
export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login jwt=$SA_TOKEN role=raft-backup);
vault operator raft snapshot save /share/vault-raft.snap;
echo "snapshot is success"
command: ["/bin/sh", "-c"]
env:
- name: VAULT_ADDR
value: https://vault.vault-internl.net:8200
image: vault:1.10.9
imagePullPolicy: Always
name: snapshot
volumeMounts:
- mountPath: /share
name: share
- args:
- -ec
- sleep 500
- "until [ -f /share/vault-raft.snap ]; do sleep 5; done;\ngsutil cp /share/vault-raft.snap\
\ gs://raft-backup/vault_raft_$(date +\"\
%Y%m%d_%H%M%S\").snap;\n"
command:
- /bin/sh
image: gcr.io/google.com/cloudsdktool/google-cloud-cli:latest
imagePullPolicy: IfNotPresent
name: upload
securityContext:
allowPrivilegeEscalation: false
volumeMounts:
- mountPath: /share
name: share
restartPolicy: OnFailure
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
serviceAccountName: raft-backup
volumes:
- emptyDir: {}
name: share
schedule: '*/3 * * * *'
startingDeadlineSeconds: 60
successfulJobsHistoryLimit: 3
suspend: false
Error while running gsutil command inside the upload pod
$ gsutil
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/configurations/named_configs.py", line 172, in ActiveConfig
return ActiveConfig(force_create=True)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/configurations/named_configs.py", line 492, in ActiveConfig
config_name = _CreateDefaultConfig(force_create)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/configurations/named_configs.py", line 640, in _CreateDefaultConfig
file_utils.MakeDir(paths.named_config_directory)
File "/usr/lib/google-cloud-sdk/lib/googlecloudsdk/core/util/files.py", line 125, in MakeDir
os.makedirs(path, mode=mode)
File "/usr/bin/../lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/bin/../lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/os.py", line 215, in makedirs
makedirs(head, exist_ok=exist_ok)
File "/usr/bin/../lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/os.py", line 225, in makedirs
mkdir(name, mode)
OSError: [Errno 30] Read-only file system: '/home/cloudsdk/.config'
$ command terminated with exit code 137

OSError: [Errno 30] Read-only file system: '/home/cloudsdk/.config' $ command terminated with exit code 137
It seems you don't give enought permission in your cronJob.
Try to change :
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsUser: 1000
by :
securityContext:
privileged: true
Tell me if it works or not and we can discuss about it.
Edit for complete response :
Use this apiVersion: batch/v1 instead of apiVersion: batch/v1beta1

cannot reach grafana loki port with http using traefik

I have been trying to find solutions to this but no luck. all services work internally. I am able to access grafana from browser with tls enabled but I am not able to reach loki port in any way(browser/postman etc.) but I can.
I can access to loki api with curl localy if I open port on for the service. but as I understand you need to expose ports from traefik to do that.
My compose file:
version: "3"
services:
grafana:
labels:
- "traefik.http.routers.grafana.entryPoints=port80"
- "traefik.http.routers.grafana.rule=host(`${DOMAIN}`)"
- "traefik.http.middlewares.grafana-redirect.redirectScheme.scheme=https"
- "traefik.http.middlewares.grafana-redirect.redirectScheme.permanent=true"
- "traefik.http.routers.grafana.middlewares=grafana-redirect"
# SSL endpoint
- "traefik.http.routers.grafana-ssl.entryPoints=port443"
- "traefik.http.routers.grafana-ssl.rule=host(`${DOMAIN}`)"
- "traefik.http.routers.grafana-ssl.tls=true"
- "traefik.http.routers.grafana-ssl.tls.certResolver=le-ssl"
- "traefik.http.routers.grafana-ssl.service=grafana-ssl"
- "traefik.http.services.grafana-ssl.loadBalancer.server.port=3000"
image: grafana/grafana:latest # or probably any other version
volumes:
- grafana-data:/var/lib/grafana
environment:
- GF_SERVER_ROOT_URL=https://${DOMAIN}
- GF_SERVER_DOMAIN=${DOMAIN}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SECURITY_ADMIN_USER=${GRAFANAUSER}
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANAPASS}
networks:
- traefik-net
loki:
image: grafana/loki
labels:
- "traefik.http.routers.loki-ssl.entryPoints=port3100"
- "traefik.http.routers.loki-ssl.rule=host(`${DOMAIN}`)"
- "traefik.http.routers.loki-ssl.tls=true"
- "traefik.http.routers.loki-ssl.tls.certResolver=le-ssl"
- "traefik.http.routers.loki-ssl.service=loki-ssl"
- "traefik.http.services.loki-ssl.loadBalancer.server.port=3100"
command: -config.file=/etc/loki/config.yaml
volumes:
- ./loki/config.yml:/etc/loki/config.yaml
- loki:/data/loki
networks:
- traefik-net
promtail:
image: grafana/promtail:2.3.0
volumes:
- /var/log:/var/log
- ./promtail:/etc/promtail-config/
command: -config.file=/etc/promtail-config/promtail.yml
networks:
- traefik-net
influx:
image: influxdb:1.7 # or any other recent version
labels:
# SSL endpoint
- "traefik.http.routers.influx-ssl.entryPoints=port8086"
- "traefik.http.routers.influx-ssl.rule=host(`${DOMAIN}`)"
- "traefik.http.routers.influx-ssl.tls=true"
- "traefik.http.routers.influx-ssl.tls.certResolver=le-ssl"
- "traefik.http.routers.influx-ssl.service=influx-ssl"
- "traefik.http.services.influx-ssl.loadBalancer.server.port=8086"
restart: always
volumes:
- influx-data:/var/lib/influxdb
environment:
- INFLUXDB_DB=grafana # set any other to create database on initialization
- INFLUXDB_HTTP_ENABLED=true
- INFLUXDB_HTTP_AUTH_ENABLED=true
- INFLUXDB_ADMIN_USER=&{DB_USER}
- INFLUXDB_ADMIN_PASSWORD=&{DB_PASS}
networks:
- traefik-net
traefik:
image: traefik:v2.9.1
ports:
- "80:80"
- "443:443"
- "3100:3100"
# expose port below only if you need access to the Traefik API
- "8080:8080"
command:
# - "--log.level=DEBUG"
- "--api=true"
- "--api.dashboard=true"
- "--providers.docker=true"
- "--entryPoints.port443.address=:443"
- "--entryPoints.port80.address=:80"
- "--entryPoints.port8086.address=:8086"
- "--entryPoints.port3100.address=:3100"
- "--certificatesResolvers.le-ssl.acme.tlsChallenge=true"
- "--certificatesResolvers.le-ssl.acme.email=${TLS_MAIL}"
- "--certificatesResolvers.le-ssl.acme.storage=/letsencrypt/acme.json"
volumes:
- traefik-data:/letsencrypt/
- /var/run/docker.sock:/var/run/docker.sock
networks:
- traefik-net
volumes:
traefik-data:
grafana-data:
influx-data:
loki:
networks:
traefik-net:
loki conf
# (default configuration)
auth_enabled: false
server:
http_listen_port: 3100
ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 1h # Any chunk not receiving new logs in this time will be flushed
max_chunk_age: 1h # All chunks will be flushed when they hit this age, default is 1h
chunk_target_size: 1048576 # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
chunk_retain_period: 30s # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
max_transfer_retries: 0 # Chunk transfers disabled
wal:
enabled: true
dir: /loki/wal
common:
ring:
instance_addr: 0.0.0.0
kvstore:
store: inmemory
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /loki/boltdb-shipper-active
cache_location: /loki/boltdb-shipper-cache
cache_ttl: 24h # Can be increased for faster performance over longer query periods, uses more disk space
shared_store: filesystem
filesystem:
directory: /loki/chunks
compactor:
working_directory: /loki/boltdb-shipper-compactor
shared_store: filesystem
limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
ingestion_burst_size_mb: 16
ingestion_rate_mb: 16
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
ruler:
storage:
type: local
local:
directory: /loki/rules
rule_path: /loki/rules-temp
alertmanager_url: localhost
ring:
kvstore:
store: inmemory
enable_api: true

How to replace Kubernetes YAML manifests fields with sed?

I am trying to inject an argument --insecure-port=0 into /etc/kubernetes/manifests/kube-apiserver.yaml file using sed, but I am having a trouble getting the indentation correct below the argument - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.132.0.43:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=10.132.0.43
- --allow-privileged=true
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction
- --enable-bootstrap-token-auth=true
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=10.96.0.0/12
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
image: k8s.gcr.io/kube-apiserver:v1.23.4
imagePullPolicy: IfNotPresent
Any help?

Single kafka pod keeps restarting

Problem
Only one, single pod is failing on k8 node. Readiness and Liveness probes are indicating long resposne time, despite fact, that port is open and traffic between nodes is flowing.
Termination error code is 137.
It's up immediately after killing, however port 9092 is not opened yet, and whole recreation process of bringing app up is taking about 30 minutes.
General environment config
We have k8 cluster consists of 6 nodes (2 nodes per each of 3 racks and racks are kept in different DC). Our kafka is deployed with helm in use, and each of its nodes is deployed on different host, due to affinity/anti-affinity.
Kafka config
- log.cleaner.min.compaction.lag.ms=0
- offsets.topic.num.partitions=50
- log.flush.interval.messages=9223372036854775807
- controller.socket.timeout.ms=30000
- principal.builder.class=null
- log.flush.interval.ms=null
- min.insync.replicas=1
- num.recovery.threads.per.data.dir=1
- sasl.mechanism.inter.broker.protocol=GSSAPI
- fetch.purgatory.purge.interval.requests=1000
- replica.socket.timeout.ms=30000
- message.max.bytes=1048588
- max.connection.creation.rate=2147483647
- connections.max.reauth.ms=0
- log.flush.offset.checkpoint.interval.ms=60000
- zookeeper.clientCnxnSocket=null
- quota.window.num=11
- zookeeper.connect=zookeeper-service.kafka-shared-cluster.svc.cluster.local:2181/kafka
- authorizer.class.name=
- password.encoder.secret=null
- num.replica.fetchers=1
- alter.log.dirs.replication.quota.window.size.seconds=1
- log.roll.jitter.hours=0
- password.encoder.old.secret=null
- log.cleaner.delete.retention.ms=86400000
- queued.max.requests=500
- log.cleaner.threads=1
- sasl.kerberos.service.name=null
- socket.request.max.bytes=104857600
- log.message.timestamp.type=CreateTime
- connections.max.idle.ms=600000
- zookeeper.set.acl=false
- delegation.token.expiry.time.ms=86400000
- session.timeout.ms=null
- max.connections=2147483647
- transaction.state.log.num.partitions=50
- listener.security.protocol.map=PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
- log.retention.hours=168
- client.quota.callback.class=null
- delete.records.purgatory.purge.interval.requests=1
- log.roll.ms=null
- replica.high.watermark.checkpoint.interval.ms=5000
- replication.quota.window.size.seconds=1
- sasl.kerberos.ticket.renew.window.factor=0.8
- zookeeper.connection.timeout.ms=18000
- metrics.recording.level=INFO
- password.encoder.cipher.algorithm=AES/CBC/PKCS5Padding
- replica.selector.class=null
- max.connections.per.ip=2147483647
- background.threads=10
- quota.consumer.default=9223372036854775807
- request.timeout.ms=30000
- log.message.format.version=2.8-IV1
- sasl.login.class=null
- log.dir=/tmp/kafka-logs
- log.segment.bytes=1073741824
- replica.fetch.response.max.bytes=10485760
- group.max.session.timeout.ms=1800000
- port=9092
- log.segment.delete.delay.ms=60000
- log.retention.minutes=null
- log.dirs=/kafka
- controlled.shutdown.enable=true
- socket.connection.setup.timeout.max.ms=30000
- log.message.timestamp.difference.max.ms=9223372036854775807
- password.encoder.key.length=128
- sasl.login.refresh.min.period.seconds=60
- transaction.abort.timed.out.transaction.cleanup.interval.ms=10000
- sasl.kerberos.kinit.cmd=/usr/bin/kinit
- log.cleaner.io.max.bytes.per.second=1.7976931348623157E308
- auto.leader.rebalance.enable=true
- leader.imbalance.check.interval.seconds=300
- log.cleaner.min.cleanable.ratio=0.5
- replica.lag.time.max.ms=30000
- num.network.threads=3
- sasl.client.callback.handler.class=null
- metrics.num.samples=2
- socket.send.buffer.bytes=102400
- password.encoder.keyfactory.algorithm=null
- socket.receive.buffer.bytes=102400
- replica.fetch.min.bytes=1
- broker.rack=null
- unclean.leader.election.enable=false
- offsets.retention.check.interval.ms=600000
- producer.purgatory.purge.interval.requests=1000
- metrics.sample.window.ms=30000
- log.retention.check.interval.ms=300000
- sasl.login.refresh.window.jitter=0.05
- leader.imbalance.per.broker.percentage=10
- controller.quota.window.num=11
- advertised.host.name=null
- metric.reporters=
- quota.producer.default=9223372036854775807
- auto.create.topics.enable=false
- replica.socket.receive.buffer.bytes=65536
- replica.fetch.wait.max.ms=500
- password.encoder.iterations=4096
- default.replication.factor=1
- sasl.kerberos.principal.to.local.rules=DEFAULT
- log.preallocate=false
- transactional.id.expiration.ms=604800000
- control.plane.listener.name=null
- transaction.state.log.replication.factor=3
- num.io.threads=8
- sasl.login.refresh.buffer.seconds=300
- offsets.commit.required.acks=-1
- connection.failed.authentication.delay.ms=100
- delete.topic.enable=true
- quota.window.size.seconds=1
- offsets.commit.timeout.ms=5000
- log.cleaner.max.compaction.lag.ms=9223372036854775807
- zookeeper.ssl.enabled.protocols=null
- log.retention.ms=604800000
- alter.log.dirs.replication.quota.window.num=11
- log.cleaner.enable=true
- offsets.load.buffer.size=5242880
- controlled.shutdown.max.retries=3
- offsets.topic.replication.factor=3
- transaction.state.log.min.isr=1
- sasl.kerberos.ticket.renew.jitter=0.05
- zookeeper.session.timeout.ms=18000
- log.retention.bytes=-1
- controller.quota.window.size.seconds=1
- sasl.jaas.config=null
- sasl.kerberos.min.time.before.relogin=60000
- offsets.retention.minutes=10080
- replica.fetch.backoff.ms=1000
- inter.broker.protocol.version=2.8-IV1
- kafka.metrics.reporters=
- num.partitions=1
- socket.connection.setup.timeout.ms=10000
- broker.id.generation.enable=true
- listeners=PLAINTEXT://:9092,OUTSIDE://:9094
- inter.broker.listener.name=null
- alter.config.policy.class.name=null
- delegation.token.expiry.check.interval.ms=3600000
- log.flush.scheduler.interval.ms=9223372036854775807
- zookeeper.max.in.flight.requests=10
- log.index.size.max.bytes=10485760
- sasl.login.callback.handler.class=null
- replica.fetch.max.bytes=1048576
- sasl.server.callback.handler.class=null
- log.cleaner.dedupe.buffer.size=134217728
- advertised.port=null
- log.cleaner.io.buffer.size=524288
- create.topic.policy.class.name=null
- controlled.shutdown.retry.backoff.ms=5000
- security.providers=null
- log.roll.hours=168
- log.cleanup.policy=delete
- log.flush.start.offset.checkpoint.interval.ms=60000
- host.name=
- log.roll.jitter.ms=null
- transaction.state.log.segment.bytes=104857600
- offsets.topic.segment.bytes=104857600
- group.initial.rebalance.delay.ms=3000
- log.index.interval.bytes=4096
- log.cleaner.backoff.ms=15000
- ssl.truststore.location=null
- offset.metadata.max.bytes=4096
- ssl.keystore.password=null
- zookeeper.sync.time.ms=2000
- fetch.max.bytes=57671680
- max.poll.interval.ms=null
- compression.type=producer
- max.connections.per.ip.overrides=
- sasl.login.refresh.window.factor=0.8
- kafka.metrics.polling.interval.secs=10
- max.incremental.fetch.session.cache.slots=1000
- delegation.token.master.key=null
- reserved.broker.max.id=1000
- transaction.remove.expired.transaction.cleanup.interval.ms=3600000
- log.message.downconversion.enable=true
- transaction.state.log.load.buffer.size=5242880
- sasl.enabled.mechanisms=GSSAPI
- num.replica.alter.log.dirs.threads=null
- group.min.session.timeout.ms=6000
- log.cleaner.io.buffer.load.factor=0.9
- transaction.max.timeout.ms=900000
- group.max.size=2147483647
- delegation.token.max.lifetime.ms=604800000
- broker.id=0
- offsets.topic.compression.codec=0
- zookeeper.ssl.endpoint.identification.algorithm=HTTPS
- replication.quota.window.num=11
- advertised.listeners=PLAINTEXT://:9092,OUTSIDE://kafka-0.kafka.kafka-shared-cluster.svc.cluster.local:9094
- queued.max.request.bytes=-1
What have been verified
RAID I/O - no spikes before reboot
Zookeeper no logs indicating any connection problem
Ping - response time is raising periodically
Kafka logging level set to DEBUG - java.io.EOFException what is just DEBUG log, not WARNING or ERROR
K8 node logs - nothing significant beside readiness and liveness probes
Pods config
Containers:
kafka:
Image: wurstmeister/kafka:latest
Ports: 9092/TCP, 9094/TCP, 9999/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Thu, 10 Feb 2022 16:36:48 +0100
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 08 Feb 2022 21:12:26 +0100
Finished: Thu, 10 Feb 2022 16:36:36 +0100
Ready: True
Restart Count: 76
Limits:
cpu: 24
memory: 64Gi
Requests:
cpu: 1
memory: 2Gi
Liveness: tcp-socket :9092 delay=3600s timeout=5s period=10s #success=1 #failure=3
Readiness: tcp-socket :9092 delay=5s timeout=6s period=10s #success=1 #failure=5
Environment:
KAFKA_AUTO_CREATE_TOPICS_ENABLE: false
ALLOW_PLAINTEXT_LISTENER: yes
BROKER_ID_COMMAND: hostname | awk -F'-' '{print $$NF}'
HOSTNAME_COMMAND: hostname -f
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS: PLAINTEXT://:9092,OUTSIDE://:9094
KAFKA_ZOOKEEPER_CONNECT: zookeeper-service.kafka-shared-cluster.svc.cluster.local:2181/kafka
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LOG_RETENTION_MS: 604800000
KAFKA_LOG_DIRS: /kafka
KAFKA_SESSION_TIMEOUT_MS: 10000
KAFKA_MAX_POLL_INTERVAL_MS: 60000
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 3000
KAFKA_JMX_OPTS:
run command properties
-Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
Java heap size in bytes
uintx ErgoHeapSizeLimit = 0 {product}
uintx HeapSizePerGCThread = 87241520 {product}
uintx InitialHeapSize := 1073741824 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 17179869184 {product}
Question
do you have any ideas why only this particular pod is failing, and any suggestion for further steps?
-- EDIT --
Containers:
kafka:
Image: wurstmeister/kafka:latest
Ports: 9092/TCP, 9094/TCP, 9999/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Running
Started: Tue, 08 Mar 2022 17:50:11 +0100
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Tue, 08 Mar 2022 16:35:38 +0100
Finished: Tue, 08 Mar 2022 17:49:51 +0100
Ready: True
Restart Count: 1
Limits:
cpu: 24
memory: 64Gi
Requests:
cpu: 1
memory: 2Gi
Liveness: tcp-socket :9092 delay=3600s timeout=5s period=10s #success=1 #failure=3
Readiness: tcp-socket :9092 delay=5s timeout=6s period=10s #success=1 #failure=5
Environment:
KAFKA_AUTO_CREATE_TOPICS_ENABLE: false
ALLOW_PLAINTEXT_LISTENER: yes
BROKER_ID_COMMAND: hostname | awk -F'-' '{print $$NF}'
HOSTNAME_COMMAND: hostname -f
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094
KAFKA_LISTENERS: PLAINTEXT://:9092,OUTSIDE://:9094
KAFKA_ZOOKEEPER_CONNECT: zookeeper-service.kafka-shared-cluster.svc.cluster.local:2181/kafka
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LOG_RETENTION_MS: 604800000
KAFKA_LOG_DIRS: /kafka
KAFKA_SESSION_TIMEOUT_MS: 10000
KAFKA_MAX_POLL_INTERVAL_MS: 60000
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 3000
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 3
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 3
KAFKA_HEAP_OPTS: -Xmx6G -Xms6G
KAFKA_DEFAULT_REPLICATION_FACTOR: 3
KAFKA_MIN_INSYNC_REPLICAS: 2
KAFKA_REPLICA_LAG_TIME_MAX_MS: 80000
KAFKA_NUM_RECOVERY_THREADS_PER_DATA_DIR: 6
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kafka-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kafka-data-kafka-0
ReadOnly: false
jolokia-agent:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jolokia-agent
Optional: false
Volumes:
kafka-data:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: kafka-data-kafka-0
ReadOnly: false
jolokia-agent:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: jolokia-agent
Optional: false
default-token-d57jd:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-d57jd
Optional: false
PVC
Name: kafka-data-kafka-0
Namespace: kafka-shared-cluster
StorageClass: local-storage
Status: Bound
Volume: pvc-1d23ba70-cb15-43c3-91b1-4febc8fd9896
Labels: app=kafka
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: rancher.io/local-path
volume.kubernetes.io/selected-node: xxx
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: kafka-0
Events: <none>

Error 503 Backend fetch failed Guru Meditation: XID: 45654 Varnish cache server

I have created helm chart for varnish cache server which is running in kubernetes cluster , while testing with the "external IP" generated its throwing error , sharing below
Sharing varnish.vcl, values.yaml and deployment.yaml below . Any suggestions how to resolve as I have hardcoded the backend/web server as .host="www.varnish-cache.org" with port : "80". My requirement is on executing curl -IL I should get the response with cached values not as described above (directly from backend server)..
Any solutions/approach would be welcomed.
varnish.vcl:
VCL version 5.0 is not supported so it should be 4.0 or 4.1 even though actually used Varnish version is 6
vcl 4.1;
import std;
# The minimal Varnish version is 5.0
# For SSL offloading, pass the following header in your proxy server or load balancer: 'X-Forwarded-Proto: https'
{{ .Values.varnishconfigData | indent 2 }}
sub vcl_recv {
# set req.backend_hint = default;
# unset req.http.cookie;
if(req.url == "/healthcheck") {
return(synth(200,"OK"));
}
if(req.url == "/index.html") {
return(synth(200,"OK"));
}
}
probe index {
.url = "/index.html";
.timeout = 60ms;
.interval = 2s;
.window = 5;
.threshold = 3;
}
backend website {
.host = "www.varnish-cache.org";
.port = "80";
.probe = index;
#.probe = {
# .url = "/favicon.ico";
#.timeout = 60ms;
#.interval = 2s;
#.window = 5;
#.threshold = 3;
# }
}
vcl_recv {
if ( req.url ~ "/index.html/") {
set req.backend = website;
} else {
Set req.backend = default;
}
}
#DAEMON_OPTS="-a :80 \
#-T localhost:6082 \
#-f /etc/varnish/default.vcl \
#-S /etc/varnish/secret \
#-s malloc,256m"
#-p http_resp_hdr_len=65536 \
#-p http_resp_size=98304 \
#sub vcl_recv {
## # Remove the cookie header to enable caching
# unset req.http.cookie;
#}
#sub vcl_deliver {
# if (obj.hits > 0) {
# set resp.http.X-Cache = "HIT";
# } else {
# set resp.http.X-Cache = "MISS";
# }
#}
values.yaml:
# Default values for varnish.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicaCount: 1
image:
repository: varnish
tag: 6.3
pullPolicy: IfNotPresent
nameOverride: ""
fullnameOverride: ""
service:
# type: ClusterIP
type: LoadBalancer
port: 80
varnishconfigData: |-
backend default {
.host = "http://35.170.216.115/";
.port = "80";
.first_byte_timeout = 60s;
.connect_timeout = 300s ;
.probe = {
.url = "/";
.timeout = 1s;
.interval = 5s;
.window = 5;
.threshold = 3;
}
}
sub vcl_backend_response {
set beresp.ttl = 5m;
}
ingress:
enabled: false
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
path: /
hosts:
- chart-example.local
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
resources:
limits:
memory: 128Mi
requests:
memory: 64Mi
#resources: {}
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# limits:
# cpu: 100m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
Deployment.yaml:
apiVersion: apps/v1beta2
kind: Deployment
metadata:
name: {{ include "varnish.fullname" . }}
labels:
app: {{ include "varnish.name" . }}
chart: {{ include "varnish.chart" . }}
release: {{ .Release.Name }}
heritage: {{ .Release.Service }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app: {{ include "varnish.name" . }}
release: {{ .Release.Name }}
template:
metadata:
labels:
app: {{ include "varnish.name" . }}
release: {{ .Release.Name }}
spec:
volumes:
- name: varnish-config
configMap:
name: {{ include "varnish.fullname" . }}-varnish-config
items:
- key: default.vcl
path: default.vcl
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
env:
- name: VARNISH_VCL
value: /etc/varnish/default.vcl
volumeMounts:
- name: varnish-config
mountPath : /etc/varnish/
ports:
- name: http
containerPort: 80
protocol: TCP
targetPort: 80
livenessProbe:
httpGet:
path: /healthcheck
# port: http
port: 80
failureThreshold: 3
initialDelaySeconds: 45
timeoutSeconds: 10
periodSeconds: 20
readinessProbe:
httpGet:
path: /healthcheck
#port: http
port: 80
initialDelaySeconds: 10
timeoutSeconds: 15
periodSeconds: 5
resources:
{{ toYaml .Values.resources | indent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{ toYaml . | indent 8 }}
{{- end }}
{{- with .Values.tolerations }}
tolerations:
{{ toYaml . | indent 8 }}
{{- end }}
Did checked with varnish logs , executed varnishlog -c and got following output
* << Request >> 556807
- Begin req 556806 rxreq
- Timestamp Start: 1584534974.251924 0.000000 0.000000
- Timestamp Req: 1584534974.251924 0.000000 0.000000
- VCL_use boot
- ReqStart 100.115.128.0 26466 a0
- ReqMethod GET
- ReqURL /healthcheck
- ReqProtocol HTTP/1.1
- ReqHeader Host: 100.115.128.11:80
- ReqHeader User-Agent: kube-probe/1.14
- ReqHeader Accept-Encoding: gzip
- ReqHeader Connection: close
- ReqHeader X-Forwarded-For: 100.115.128.0
- VCL_call RECV
- VCL_return synth
- VCL_call HASH
- VCL_return lookup
- Timestamp Process: 1584534974.251966 0.000042 0.000042
- RespHeader Date: Wed, 18 Mar 2020 12:36:14 GMT
- RespHeader Server: Varnish
- RespHeader X-Varnish: 556807
- RespProtocol HTTP/1.1
- RespStatus 200
- RespReason OK
- RespReason OK
- VCL_call SYNTH
- RespHeader Content-Type: text/html; charset=utf-8
- RespHeader Retry-After: 5
- VCL_return deliver
- RespHeader Content-Length: 229
- Storage malloc Transient
- Filters
- RespHeader Accept-Ranges: bytes
- RespHeader Connection: close
- Timestamp Resp: 1584534974.252121 0.000197 0.000155
- ReqAcct 125 0 125 210 229 439
- End

I don't think this will work:
.host = "www.varnish-cache.org";
.host = "100.68.38.132"
It has two host declaration and it's missing the ";"
Please try to change it to
.host = "100.68.38.132";
Sharing the logs generated when running command varnishlog -g request -q "ReqHeader:Host eq 'a2dc15095678711eaae260ae72bc140c-214951329.ap-southeast-1.elb.amazonaws.com'" -q "ReqUrl eq '/'" below please look into it..
* << Request >> 1512355
- Begin req 1512354 rxreq
- Timestamp Start: 1584707667.287292 0.000000 0.000000
- Timestamp Req: 1584707667.287292 0.000000 0.000000
- VCL_use boot
- ReqStart 100.112.64.0 51532 a0
- ReqMethod GET
- ReqURL /
- ReqProtocol HTTP/1.1
- ReqHeader Host: 52.220.214.66
- ReqHeader User-Agent: Mozilla/5.0 zgrab/0.x
- ReqHeader Accept: */*
- ReqHeader Accept-Encoding: gzip
- ReqHeader X-Forwarded-For: 100.112.64.0
- VCL_call RECV
- ReqUnset Host: 52.220.214.66
- ReqHeader host: 52.220.214.66
- VCL_return hash
- VCL_call HASH
- VCL_return lookup
- VCL_call MISS
- VCL_return fetch
- Link bereq 1512356 fetch
- Timestamp Fetch: 1584707667.287521 0.000228 0.000228
- RespProtocol HTTP/1.1
- RespStatus 503
- RespReason Backend fetch failed
- RespHeader Date: Fri, 20 Mar 2020 12:34:27 GMT
- RespHeader Server: Varnish
- RespHeader Content-Type: text/html; charset=utf-8
- RespHeader Retry-After: 5
- RespHeader X-Varnish: 1512355
- RespHeader Age: 0
- RespHeader Via: 1.1 varnish (Varnish/6.3)
- VCL_call DELIVER
- RespHeader X-Cache: uncached
- VCL_return deliver
- Timestamp Process: 1584707667.287542 0.000250 0.000021
- Filters
- RespHeader Content-Length: 284
- RespHeader Connection: keep-alive
- Timestamp Resp: 1584707667.287591 0.000299 0.000048
- ReqAcct 110 0 110 271 284 555
- End
** << BeReq >> 1512356
-- Begin bereq 1512355 fetch
-- VCL_use boot
-- Timestamp Start: 1584707667.287401 0.000000 0.000000
-- BereqMethod GET
-- BereqURL /
-- BereqProtocol HTTP/1.1
-- BereqHeader User-Agent: Mozilla/5.0 zgrab/0.x
-- BereqHeader Accept: */*
-- BereqHeader Accept-Encoding: gzip
-- BereqHeader X-Forwarded-For: 100.112.64.0
-- BereqHeader host: 52.220.214.66
-- BereqHeader X-Varnish: 1512356
-- VCL_call BACKEND_FETCH
-- VCL_return fetch
-- FetchError backend default: unhealthy
-- Timestamp Beresp: 1584707667.287429 0.000028 0.000028
-- Timestamp Error: 1584707667.287432 0.000031 0.000002
-- BerespProtocol HTTP/1.1
-- BerespStatus 503
-- BerespReason Service Unavailable
-- BerespReason Backend fetch failed
-- BerespHeader Date: Fri, 20 Mar 2020 12:34:27 GMT
-- BerespHeader Server: Varnish
-- VCL_call BACKEND_ERROR
-- BerespHeader Content-Type: text/html; charset=utf-8
-- BerespHeader Retry-After: 5
-- VCL_return deliver
-- Storage malloc Transient
-- Length 284
-- BereqAcct 0 0 0 0 0 0
-- End

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Ngnix ingress controller Modsecurity (OWASP Ruleset) high latency response - kubernetes

What do you mean by 'high latency'? Is it affecting all requests or only specific ones? Have you tried disabling DoS protection in crs-setup.conf?

Related

Getting error in Kubernetes cronjob while using google cloud sdk to upload data on GCS bucket

cannot reach grafana loki port with http using traefik

How to replace Kubernetes YAML manifests fields with sed?

Single kafka pod keeps restarting

Error 503 Backend fetch failed Guru Meditation: XID: 45654 Varnish cache server

Categories

Resources