Is there any way I can check what data is there in spring cloud dataflow stream source(say some named destination ":mySource") and sink(say "log" as sink)?
e.g. dataflow:>stream create --name demo --definition ":mySource>log"
Here what is there in mySource and log - how to check?
Is it like I have to check spring cloud dataflow log somewhere to get any clue, if it at all has logs? If so, what is the location of logs for windows environment?
If you're interested in the payload content, you can deploy the stream with the DEBUG logs for the Spring Integration package, which will print the header + payload information among many other interesting lifecycle details. The logs will be either the payload consumed or produced depending on the application-type (i.e., source, processor, or sink).
In your case, you can view the payload consumed by the log-sink via:
dataflow:>stream create --name demo --definition ":mySource > log --logging.level.org.springframework.integration=DEBUG"
We have plans to add native provenance/lineage support with the help of Zipkin and Sleuth in the future releases.
Related
I am looking for solution where we can store all logs (Info, Debug etc) of a Streamsets pipeline (Job) to S3 buckets ?
Currently logs are only available at log console of Streamsets UI only
Looking at the code streamsets uses log4j for all its logging, so you could use something like https://github.com/parvanov/log4j-s3 and create a log write appender.
an alternative is that you could create a new pipeline to consume the logs and write to s3.
Logs are stored in $SDC_HOME/log directory on the VM running SDC. from there you can copy them to S3. Or you can configure your own path, and map it to an S3 bucket on OS level.
I'm designing a distributed application, comprised of several Spring microservices that will be deployed with Kubernetes. It is a batch processing app, and a typical request could take several minutes of processing, with the processing getting distributed across the services, using Kafka as a message broker.
A requirement of the project is that each request will generate a log file, which will need to be stored on the application file store for retrieval. The current design is, all the processing services write log messages (with the associated unique request ID) to Kafka, and there is a dedicated logging microservice that reads these messages down, does some formatting and should persist them to the log file associated with the given request ID.
I'm very unfamiliar with how files should be stored in web applications. Should I be storing these log files to the local file system? If so, wouldn't that mean this "logging service" couldn't be scaled? For example, if I scaled the log service to 2 instances, then each instance would only have access to half of the log files in theory. And if a user makes a request to retrieve a log file, there is no guarantee that the requested log file will be at whatever log service instance the Kubernetes load balancer routed them too.
What is the currently accepted "best practice" for having a file system in a distributed application? Or should I just accept that the logging service can never be scaled up?
A possible solution I can think of would just store the text log files in our MySQL database as TEXT rows, making the logging service effectively stateless. If someone could point out any potential issues with this that would be much appreciated?
deployed with Kubernetes
each request will generate a log file, which will need to be stored on the application file store
Don't do this. Use a Fluentd / Filebeat / promtail / Splunk forwarder side car that gathers stdout from the container processes.
Or have your services write to a kafka logs topic rather than create files.
With either option, use a collector like Elasticsearch, Grafana Loki, or Splunk
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
wouldn't that mean this "logging service" couldn't be scaled?
No, each of these services are designed to be scaled
possible solution I can think of would just store the text log files in our MySQL database as TEXT rows,
Sure, but Elasticsearch or Solr are purpose-built for gathering and searching plaintext, not MySQL.
Don't treat logs as something application specific. In other words, your solution shouldn't be unique to Spring
I am new to sleuth and zipkin. I have logged some messages and sleuth is appending trace id and space id for those messages. I am using zipkin to visualize it. I am able to see timings at different microservices. Can we see logger messages(we put at different microservices) in zipkin UI by trace id?
No you can't. You can use tools like Elasticsearch Logstash Kibana to visualize it. You can go to my repo https://github.com/marcingrzejszczak/docker-elk and run ./ getReadyForConference.sh, it will start docker containers with the ELK stack, run the apps, curl the request to the apps so that you can then check them in ELK.
I trying to develop a working example of Snowplow click tracking. I have to setup enrichment process to enrich raw data on Kinesis stream. But, when I am running JAR file, I am getting this error:
ERROR com.amazonaws.services.kinesis.leases.impl.LeaseManager - Failed to get table status for SnowplowEnrich-${enrich.streams.in.raw}
Is DynamoDB a necessity for enrichment process?
It depends, in batch mode DynamoDB is not necessary for enrichment process, DynamoDB is used in the RDB Shredder.
Which release are (were) you trying to install. For a PoC you can use Snowplow Mini
Snowplow community is active in discourse.snowplowanalytics.com
How can I centrally log my spring boot REST services which are running in different applications on the cloud foundry platform? For example I want to log how much a particular services is requested. The log should also be persistent even if I have to restart / reset my application. I don't only want to see the last log entries with cf logs --recent. Is there a best practise?
Your applications should configure their logging to write to stdout and stderr. The Cloud Foundry logging subsystem will automatically pick up everything written to stdout and stderr and send it to the log aggregator. See the Application Logging docs for more info.
To persist the logs and make them available for viewing and analysis, they should be streamed to an external log capture system. Some Cloud Foundry docs contain some general information about configuring log streaming and some specific instructions for some popular log capture systems.