How to remove the unwanted characters from fluentd logs - kubernetes

Currently I am sending my Kubernetes logs to cloud watch using Fluentd, but when I check the logs in cloudwatch, the logs are having extra unicode characters. I tried different ways to and regexp to solve but no luck. Here is the sample how my log is in cloud watch
Log in Cloudwatch: "log": "\u001b[2m2021-10-13 20:07:10.351\u001b[0;39m \u001b[32m INFO\u001b[0;39m \u001b[35m1\u001b[0;39m \u001b[2m---\u001b[0;39m \u001b[2m[trap-executor-0]\u001b[0;39m \u001b[36mc.n.d.s.r.aws.ConfigClusterResolver \u001b[0;39m \u001b[2m:\u001b[0;39m Resolving eureka endpoints via configuration\n"
Actual log : 2021-10-13 20:07:10.351 INFO 1 --- [trap-executor-0] c.n.d.s.r.aws.ConfigClusterResolver : Resolving eureka endpoints via configuration

Related

Consul agent on kubernetes, on node or pod?

I deployed an aws eks cluster via terraform. I also deployed Consul following hasicorp’s tutorial and I see the nodes in consul’s UI.
Now I’m wondering how al the consul agents will know about the pods I deploy? I deploy something and it’s not shown anywhere on consul.
I can’t find any documentation as to how to register pods (services) on consul via the node’s consul agent, do I need to configure that somewhere? Should I not use the node’s agent and register the service straight from the pod? Hashicorp discourages this since it may increase resource utilization depending on how many pods one deploy on a given node. But then how does the node’s agent know about my services deployed on that node?
Moreover, when I deploy a pod in a node and ssh into the node, and install consul, consul’s agent can’t find the consul server (as opposed from the node, which can find it)
EDIT:
Bottom line is I can't find WHERE to add the configuration. If I execute ON THE POD:
consul members
It works properly and I get:
Node Address Status Type Build Protocol DC Segment
consul-consul-server-0 10.0.103.23:8301 alive server 1.10.0 2 full <all>
consul-consul-server-1 10.0.101.151:8301 alive server 1.10.0 2 full <all>
consul-consul-server-2 10.0.102.112:8301 alive server 1.10.0 2 full <all>
ip-10-0-101-129.ec2.internal 10.0.101.70:8301 alive client 1.10.0 2 full <default>
ip-10-0-102-175.ec2.internal 10.0.102.244:8301 alive client 1.10.0 2 full <default>
ip-10-0-103-240.ec2.internal 10.0.103.245:8301 alive client 1.10.0 2 full <default>
ip-10-0-3-223.ec2.internal 10.0.3.249:8301 alive client 1.10.0 2 full <default>
But if i execute:
# consul agent -datacenter=voip-full -config-dir=/etc/consul.d/ -log-file=log-file -advertise=$(wget -q -O - http://169.254.169.254/latest/meta-data/local-ipv4)
I get the following error:
==> Starting Consul agent...
Version: '1.10.1'
Node ID: 'f10070e7-9910-06c7-0e12-6edb6cc4c9b9'
Node name: 'ip-10-0-3-223.ec2.internal'
Datacenter: 'voip-full' (Segment: '')
Server: false (Bootstrap: false)
Client Addr: [127.0.0.1] (HTTP: 8500, HTTPS: -1, gRPC: -1, DNS: 8600)
Cluster Addr: 10.0.3.223 (LAN: 8301, WAN: 8302)
Encrypt: Gossip: false, TLS-Outgoing: false, TLS-Incoming: false, Auto-Encrypt-TLS: false
==> Log data will now stream in as it occurs:
2021-08-16T18:23:06.936Z [WARN] agent: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.936Z [WARN] agent: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.946Z [WARN] agent.auto_config: skipping file /etc/consul.d/consul.env, extension must be .hcl or .json, or config format must be set
2021-08-16T18:23:06.947Z [WARN] agent.auto_config: Node name "ip-10-0-3-223.ec2.internal" will not be discoverable via DNS due to invalid characters. Valid characters include all alpha-numerics and dashes.
2021-08-16T18:23:06.948Z [INFO] agent.client.serf.lan: serf: EventMemberJoin: ip-10-0-3-223.ec2.internal 10.0.3.223
2021-08-16T18:23:06.948Z [INFO] agent.router: Initializing LAN area manager
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=udp
2021-08-16T18:23:06.950Z [WARN] agent.client.serf.lan: serf: Failed to re-join any previously known node
2021-08-16T18:23:06.950Z [INFO] agent: Started DNS server: address=127.0.0.1:8600 network=tcp
2021-08-16T18:23:06.951Z [INFO] agent: Starting server: address=127.0.0.1:8500 network=tcp protocol=http
2021-08-16T18:23:06.951Z [WARN] agent: DEPRECATED Backwards compatibility with pre-1.9 metrics enabled. These metrics will be removed in a future version of Consul. Set `telemetry { disable_compat_1.9 = true }` to disable them.
2021-08-16T18:23:06.953Z [INFO] agent: started state syncer
2021-08-16T18:23:06.953Z [INFO] agent: Consul agent running!
2021-08-16T18:23:06.953Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:06.954Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
2021-08-16T18:23:34.169Z [WARN] agent.router.manager: No servers available
2021-08-16T18:23:34.169Z [ERROR] agent.anti_entropy: failed to sync remote state: error="No known Consul servers"
So where to add the config?
I also tried adding a service in k8s pointing to the pod, but the service doesn't come up on consul's UI...
What do you guys recommend?
Thanks
Consul knows where these services are located because each service
registers with its local Consul client. Operators can register
services manually, configuration management tools can register
services when they are deployed, or container orchestration platforms
can register services automatically via integrations.
if you planning to use manual option you have to register the service into the consul.
Something like
echo '{
"service": {
"name": "web",
"tags": [
"rails"
],
"port": 80
}
}' > ./consul.d/web.json
You can find the good example at : https://thenewstack.io/implementing-service-discovery-of-microservices-with-consul/
Also this is a very nice document for having detailed configuration of the health check and service discovery : https://cloud.spring.io/spring-cloud-consul/multi/multi_spring-cloud-consul-discovery.html
Official document : https://learn.hashicorp.com/tutorials/consul/get-started-service-discovery
BTW, I was finally able to figure out the issue.
consul-dns is not deployed by default, i had to manually deploy it, then forward all .consul requests from coredns to consul-dns.
All is working now. Thanks!

Unable to Run a Composed Task in Spring Cloud Data Flow

I am running latest version of SCDF server on Kubernetes cluster. Every time I try to run a composed task, it tries to fetch the application properties for composed-task-runner application and fails to launch the composed task.
First of all, SCDf is trying to pull the properties (metadata) from Spring Maven repo when I am running the server on k8s. my server behind a firewall and it cannot connect to spring maven repo. I already downloaded the composed-task-runner docker image to my local repo and added the composed-task-runner application using UI. Why it still tries to download metadata from Spring Maven repo ? How do I stop it ?
here is the log :
2020-11-21 15:49:07.591 INFO 1 --- [nio-8080-exec-4] o.s.c.d.s.k.DefaultContainerFactory : Using Docker entry point style: exec
2020-11-21 15:49:58.355 WARN 1 --- [nio-8080-exec-6] .s.c.d.s.s.i.TaskConfigurationProperties : org.springframework.cloud.dataflow.server.service.impl.TaskConfigurationProperties.logDeprecationWarning is deprecated. Please use org.springframework.cloud.dataflow.server.service.impl.ComposedTaskRunnerConfigurationProperties.logDeprecationWarning
2020-11-21 15:50:18.427 WARN 1 --- [nio-8080-exec-6] ApplicationConfigurationMetadataResolver : Failed to retrieve properties for resource org.springframework.cloud:spring-cloud-dataflow-composed-task-runner:jar:2.7.0-SNAPSHOT because of ConnectTimeoutException: Connect to repo.spring.io:443 timed out
2020-11-21 15:50:38.522 WARN 1 --- [nio-8080-exec-6] ApplicationConfigurationMetadataResolver : Failed to retrieve properties for resource org.springframework.cloud:spring-cloud-dataflow-composed-task-runner:jar:2.7.0-SNAPSHOT because of ConnectTimeoutException: Connect to repo.spring.io:443 timed out
2020-11-21 15:50:38.572 INFO 1 --- [nio-8080-exec-6] o.s.c.d.s.k.KubernetesTaskLauncher : Preparing to run a container from org.springframework.cloud:spring-cloud-dataflow-composed-task-runner:jar:2.7.0-SNAPSHOT. This may take some time if the image must be downloaded from a remote container registry.
2020-11-21 15:50:38.573 INFO 1 --- [nio-8080-exec-6] o.s.c.d.s.k.DefaultContainerFactory : Using Docker image: //org.springframework.cloud:spring-cloud-dataflow-composed-task-runner:jar:2.7.0-SNAPSHOT
Looks like, the Composed Task Runner docker image can now be set using the environment variable :
name: SPRING_CLOUD_DATAFLOW_TASK_COMPOSED_TASK_RUNNER_URI
value: 'docker://springcloud/spring-cloud-dataflow-composed-task-runner:2.6.0'
We were on SCDF server 2.2.4 version before this and we had to manually add the composed task runner as an Application using the dashboard UI.
Right now, all I had to do is to download this image and push to my local git repo and use it here.

fluentbit writes to /var/log/messages

I'm running fluentbit (td-agent-bit) on a CentOS system in order to output all logs in a centralized system. Everytime fluentbit pushes a record to the remote location, it adds a record in /var/log/messages as well, leading up to a huge log filesize.
Jul 21 08:48:53 hostname td-agent-bit: [2020/07/21 08:48:53] [ info] [out_azure] customer_id=XXXXXXXXXXXXXXXXXXXXXXXX, HTTP status=200
Any idea how can I stop a service (td-agent-bit) from writing to /var/log/messages? Couldn't find any configuration parameter (e.g. verbose) in fluentbit documentation. Thanks!
Your log_level is "info" which includes a lot of messages of the pipeline. You can either decrease the log level inside the output section of the plugin to "error" only, e.g:
[OUTPUT]
name azure
match *
log_level error
note: you can decrease the general log_level also in the main [SERVICE] section.

fluentd isnt shipping logs to stackdriver

I have an application deployed on kubernetes on GKE,
Kubernetes version: v1.7.11-gke.1
Stackdriver Logging is enabled on my cluster
fluntd-gcp image on my cluster (by default):
gcr.io/google-containers/fluentd-gcp:2.0.9
my logs were all ok, seen in stackdriver,
but since a few days ago logs from one deployment (lets call it my-app ) stopped arriving in stackdriver
even though they are logged from my app :
kubectl logs -f my-app-3270987706-cx0r2 --namespace=production
{"time":"2018-01-30 16:11:13.155","msg":"ignoring xml"}
{"time":"2018-01-30 16:11:14.155","msg":"success blabla"}
I see the following logs from fluentd:
2018-01-30 16:11:46 +0000 [warn]: emit transaction failed:
error_class=Errno::ENOENT error="No such file or directory # sys_fail2 -
(/var/log/fluentd-buffers/kubernetes.system.buffer..b563203c1da7cb5e1.log, /var/log/fluentd-
buffers/kubernetes.system.buffer..q563203c1da7cb5e1.log)" tag="docker"
2018-01-30 16:11:46 +0000 [warn]: suppressed same stacktrace
2018-01-30 16:11:46 +0000 [error]: Exception emitting record:
No such file or directory # sys_fail2 -
(/var/log/fluentd-buffers/kubernetes.system.buffer..b563203c1da7cb5e1.log,
/var/log/fluentd-buffers/kubernetes.system.buffer..q563203c1da7cb5e1.log)
why logs arent shipped to stackdriver? how can I fix it?
edit:
Ill note that the logs of other apps do appear in stackdriver
the logs of the failing app are very big - maybe thats why they fail to log?

Spring Boot with server.contextPath set vs. URL to hystrix.stream via Eureka Server

I have Eureka Server with Turbine instance running and a few discovery clients that are connected to it. Everything works fine, but if I register a discovery client that has server.contextPath set, it didn't get recognized by InstanceMonitor and Turbine stream is not able to combine its hystrix.stream.
This is how it looks in the logs of Eureka/Turbine server:
2015-02-12 06:56:23.265 INFO 1 --- [ Timer-0] c.n.t.discovery.InstanceObservable : Hosts up:3, hosts down: 0
2015-02-12 06:56:23.266 INFO 1 --- [ Timer-0] c.n.t.monitor.instance.InstanceMonitor : Url for host: http://user-service:8887/hystrix.stream default
2015-02-12 06:56:23.268 ERROR 1 --- [InstanceMonitor] c.n.t.monitor.instance.InstanceMonitor : Could not initiate connection to host, giving up: []
2015-02-12 06:56:23.269 WARN 1 --- [InstanceMonitor] c.n.t.monitor.instance.InstanceMonitor : Stopping InstanceMonitor for: user-service default
com.netflix.turbine.monitor.instance.InstanceMonitor$MisconfiguredHostException: []
at com.netflix.turbine.monitor.instance.InstanceMonitor.init(InstanceMonitor.java:318)
at com.netflix.turbine.monitor.instance.InstanceMonitor.access$100(InstanceMonitor.java:103)
at com.netflix.turbine.monitor.instance.InstanceMonitor$2.call(InstanceMonitor.java:235)
at com.netflix.turbine.monitor.instance.InstanceMonitor$2.call(InstanceMonitor.java:229)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It tries to get hystrix stream from http://user-service:8887/hystrix.stream where the correct URL including sever.contextPath should be http://user-service:8887/uaa/hystrix.stream
The application.yml of that client contains:
server:
port: 8887
contextPath: /uaa
security:
ignored: /css/**,/js/**,/favicon.ico,/webjars/**
basic:
enabled: false
My question is: should I add some additional configuration options to this user-service discovery client to register proper hystrix.stream URL location?
I didn't dig into that yet, I will let you know if found something before getting answer to that question.
Current solution
There is one problem when it comes to using server.contextPath and management.context-path. When both are set, turbine stream is being served on ${HOST_URL}/${server.contextPath}/${management.context-path}/hystrix.stream. In that case I had to drop using server.contextPath (I replaced it with a prefix in controllers #RequestMapping).
Now, when you user management.context-path, then your hystrix.stream is being served from the URL that uses it as a prefix. In that case you have to follow Spencer's suggestion and set
turbine.instanceUrlSuffix=/{PUT_YOUR_MANAGEMENT_CONTEXT_PATH_HERE}/hystrix.stream
And of course this management.context-path must be set with the same value for all your Discovery Clients - it can be done easily with Spring Cloud Config http://cloud.spring.io/spring-cloud-config/spring-cloud-config.html
You can set turbine.instanceUrlSuffix.<CLUSTERNAME>=/uaa/hystrix.stream. Where <CLUSTERNAME> is the value set in turbine.aggregator.clusterConfig. All of the config options from the Turbine 1 wiki work. You don't need to add the port to the suffix as Spring Cloud Netflix Turbine adds the port from eureka.