logger messages in zipkin - spring-cloud

I am new to sleuth and zipkin. I have logged some messages and sleuth is appending trace id and space id for those messages. I am using zipkin to visualize it. I am able to see timings at different microservices. Can we see logger messages(we put at different microservices) in zipkin UI by trace id?

No you can't. You can use tools like Elasticsearch Logstash Kibana to visualize it. You can go to my repo https://github.com/marcingrzejszczak/docker-elk and run ./ getReadyForConference.sh, it will start docker containers with the ELK stack, run the apps, curl the request to the apps so that you can then check them in ELK.

Related

How to send logs from Google Stackdriver to Kafka

I see many docs and posts about how to send logs to Stackdriver but almost no information about how to do the opposite - send logs from the Stackdriver to Kafka.
In my case, our Ops want to collect the logs from our web servers using Google's stackdriver agents and pushing them to stackdriver ... However, for my stream processing needs I want to get the logs into Kafka to use it's unparalleled abilities to retain and reprocess data by any number of consumers, something that I cannot do with PubSub.
So, what are the options for doing this? I only saw a couple of possible avenues - neither sounds too good:
based on this post: (https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76) push data into PubSub first, and then read from it using either Kafka connector or write my own Kafka consumer. I hate the thought of adding yet another hop (serialize/deserialize/ack/etc.) between the source of data and Kafka ....
I noticed a brief mentioning in passing on adding a plugin to Google's version of Fluentd (which is what stackdriver log collection agent is based on) here: https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76 . Not many details - so hard to tell how involved this approach is ...
Any other options?
Thank you!
Enter in to the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < . Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times.
You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH.
***Image1
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < a couple of times to see the output. Your output will look like this
*** image2
The Kafka plugin is deprecated. For more information, refer to https://cloud.google.com/stackdriver/docs/deprecations
Note: This functionality is only available for agents running on Linux. It is not available on Windows.
Kafka is monitored via JMX. Monitoring supports monitoring Kafka version 0.8.2 and higher.
On your VM instance, download kafka-082.conf from the GitHub configuration repository and place it in the directory /etc/stackdriver/collectd.d/:
(cd /etc/stackdriver/collectd.d/ && sudo curl -O https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/kafka-082.conf)
The downloaded plugin configuration file assumes that your Kafka server is configured to accept JMX connections on port 9999. If you have configured Kafka with a different JMX port, as root, edit the file and follow the instructions to change the JMX port settings.
After adding the configuration file, restart the Monitoring agent by running the following command:
sudo service stackdriver-agent restart
What is monitored:
https://cloud.google.com/monitoring/api/metrics_agent#agent-kafka

Dynamic creation of Kafka Connectors

I have deployed a Kafka cluster and a Kafka Connect cluster in kubernetes, using Strimzi and AKS. And I wanted to start reading from RSS resources to feed my Kafka cluster, so I created a connector instance of "org.kaliy.kafka.connect.rss.RssSourceConnector" which reads from a specific RSS feed, given an url, and writes to a specific topic. But my whole intention with this is to eventually have a Kafka Connect cluster able to manage a lot of external requests of new RSSs to read from; and here is where all my doubts come in:
Shoud I create an instance of Kaliy RSS connector for each RSS feed? Or would it be better to implement my own connector, so I create only one instance of it and each time I want to read a new RSS feed I would create a new Task in the connector?
Who should be resposible of assuring the Kafka Connect Cluster state is the desired one? I mean that if a Connector(in the case of 1 RSS feed : 1 Connector instance) stopped working, who should try to start it again? An external client via the Kafka Connect REST API? Kubernetes itself?
Right now, I think my best option is to rely on Kafka Connect REST API making the external client responsible of managing the state of the set of connectors, but I don't know if these was designed to recieve a lot of requests as it would be the case. Maybe these could be scaled by provisioning several listeners in the Kafka Connect REST API configuration but I do not know.
Thanks a lot!
One of the main benefits in using Kafka Connect is to leverage a configuration-driven approach, so you will lose this by implementing your own Connector. In my opinion, the best strategy is to have one Connector instance for each RSS feed. Reducing the number of instances could make sense when having a single data source system, to avoid overloading it.
Using Strimzi Operator, Kafka Connect cluster would be monitored and it will try to restore the desired cluster state when needed. This does not include the single Connector instances and their tasks, but you may leverage the K8s API for monitoring the Connector custom resource (CR) status, instead of the REST API.
Example:
$ kubetctl get kafkaconnector amq-sink -o yaml
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
# ...
status:
conditions:
- lastTransitionTime: "2020-12-07T10:30:28.349Z"
status: "True"
type: Ready
connectorStatus:
connector:
state: RUNNING
worker_id: 10.116.0.66:8083
name: amq-sink
tasks:
- id: 0
state: RUNNING
worker_id: 10.116.0.66:8083
type: sink
observedGeneration: 1
It could be late, but it could help anyone will pass by the question, It is more relevant to have a look at Kafka-connect CR (Custom Resources) as a part of Confluent For Kubernetes (CFK), it introduces a clear cut declarative way to manage and monitor Connector with health checks and auto healing.
https://www.confluent.io/blog/declarative-connectors-with-confluent-for-kubernetes/

Logging and event tracer on Kubernetes

Is there any way of getting merged logs from more than one deployments on Kybernetes? What's the best way of logging events for all deployments?
Look for Elasticsearch , Logstash and Kibana (ELK) stack with Filebeats or FluentD to ship log data from individual deployments/pods onto your Elasticsearch DB. Once data is in your DB , use Kibana to visualize and search your merged logs. Logstash can be used to modify your data inflight. A simple google search should yield you lot of resources on doing the same.

Logging Kubernetes with an external ELK stack

Is there any documentation out there on sending logs from containers in K8s to an external ELK cluster running on EC2 instances?
We're in the process of trying to Kubernetes set up and I'm trying to figure out how to get the logging to work correctly. We already have an ELK stack setup on EC2 for current versions of the application but most of the documentation out there seems to be referring to ELK as it's deployed to the K8s cluster.
I am also working on the same cause.
First you should know what driver is being used by your docker containers to manage the logs (json driver/ journald etc - read here).
After that you should use some log collector in your architecture to send the logs to the Logstash endpoint. You can use filebeat/fluent bit. They are light weight alternatives to logstash/fluentd respectively. You must use one of them and not directly send your logs to logstash via syslog since these log shippers have a special functionality of enriching your logs with kubernetes metadata of the respective containers.
There might be lot of challenges after that. Parsing log data (multiline logs for example) etc. For an efficient pipeline, it’s better to do most of the work (i.e. extracting the date object from the logs etc) at the log sender side, than using the common logstash for this purpose that might be a bottle-neck.
Note that in case the container logs are not sent to stdout/stderr but written else-where, you might need to run filebeat/fluent-bit as side-car with your containers.
As for the links for documentation are concerned, I myself didn’t find anything documented in a single place on this, but the keywords that I mentioned over, reading about them I got to know many things.
Hope this helps.

way to check spring cloud stream source and sink data content

Is there any way I can check what data is there in spring cloud dataflow stream source(say some named destination ":mySource") and sink(say "log" as sink)?
e.g. dataflow:>stream create --name demo --definition ":mySource>log"
Here what is there in mySource and log - how to check?
Is it like I have to check spring cloud dataflow log somewhere to get any clue, if it at all has logs? If so, what is the location of logs for windows environment?
If you're interested in the payload content, you can deploy the stream with the DEBUG logs for the Spring Integration package, which will print the header + payload information among many other interesting lifecycle details. The logs will be either the payload consumed or produced depending on the application-type (i.e., source, processor, or sink).
In your case, you can view the payload consumed by the log-sink via:
dataflow:>stream create --name demo --definition ":mySource > log --logging.level.org.springframework.integration=DEBUG"
We have plans to add native provenance/lineage support with the help of Zipkin and Sleuth in the future releases.