Flume TAILDIR Source to Kafka Sink- Static Interceptor Issue - apache-kafka

The scenario I'm trying to do is as follows:
1- Flume TAILDIR Source reading from a log file and appending a static interceptor to the beginning of the message. The interceptor consists of the host name and the host IP cause it's required with every log message I receive.
2- Flume Kafka Producer Sink that take those messages from the file and put them in a Kafka topic.
The Flume configuration is as follows:
tier1.sources=source1
tier1.channels=channel1
tier1.sinks =sink1
tier1.sources.source1.interceptors=i1
tier1.sources.source1.interceptors.i1.type=static
tier1.sources.source1.interceptors.i1.key=HostData
tier1.sources.source1.interceptors.i1.value=###HostName###000.00.0.000###
tier1.sources.source1.type=TAILDIR
tier1.sources.source1.positionFile=/usr/software/flumData/flumeStressAndKafkaFailureTestPos.json
tier1.sources.source1.filegroups=f1
tier1.sources.source1.filegroups.f1=/usr/software/flumData/flumeStressAndKafkaFailureTest.txt
tier1.sources.source1.channels=channel1
tier1.channels.channel1.type=file
tier1.channels.channel1.checkpointDir = /usr/software/flumData/checkpoint
tier1.channels.channel1.dataDirs = /usr/software/flumData/data
tier1.sinks.sink1.channel=channel1
tier1.sinks.sink1.type=org.apache.flume.sink.kafka.KafkaSink
tier1.sinks.sink1.kafka.bootstrap.servers=<Removed For Confidentiality >
tier1.sinks.sink1.kafka.topic=FlumeTokafkaTest
tier1.sinks.sink1.kafka.flumeBatchSize=20
tier1.sinks.sink1.kafka.producer.acks=0
tier1.sinks.sink1.useFlumeEventFormat=true
tier1.sinks.sink1.kafka.producer.linger.ms=1
tier1.sinks.sink1.kafka.producer.client.id=HOSTNAME
tier1.sinks.sink1.kafka.producer.compression.type = snappy
So now I'm testing, I ran a Console Kafka Consumer and I started to write in the source file and I do receive the message with the header appended.
Example:
I write 'test' in the source file and press Enter then save the file
Flume detect the file change, then it sends the new line to Kafka producer.
My consumer get the following line:
###HostName###000.00.0.000###test
The issue now is that sometimes, the interceptor doesn't work as expected. It's like Flume sends 2 messages, one contains the interceptor and the other one the message content.
Example:
I write 'hi you' in the source file and press Enter then save the file
Flume detect the file change, then it sends the new line to Kafka producer.
My consumer get the following 2 line:
###HostName###000.00.0.000###
hi you
And the terminal scrolls to the the new message content.
This case always happen when I type 'hi you' in the text file, and since I read from a log file, then it's not predictable when it happens.
Help and support will be much appreciated ^^
Thank you

So the problem was from Kafka Consumer. It receives the full message from flume
Interceptor + some garbage characters + message
and if one of the garbage characters was \n (LF in Linux systems) then it assumes its 2 messages, not 1.
I'm using Kafka Consumer element in Streamsets, so it's simple to change the message delimiter. I made it \r\n and now it's working fine.
If you are dealing with the full message as a string and want to apply a regex on it or want to write it to a file, then it's better to replace \r and \n with an empty string.
The full walkthrough to the answer can be found here:
https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-TAILDIR-Source-to-Kafka-Sink-Static-Interceptor-Issue/m-p/86388#M3508

Related

How to get full Message with KafkaItemReader in SCDF Task?

I'm trying to create a SCDF Task to handle errors but I can't figure out how to get full kafka message with payload and headers.
The idea is to route the messages to a DLQ in my streams when a service is not responding. For example some HTTP service is down and the httclient app is failing.
When the HTTP service is back up, I would like to run a task which take the messages in the DLQ and resend them to the proper Kafka topic, no matter what the message is.
I'm trying to make a generic task so the DLQ and target topic are Kafka consumer and producer properties.
And I would like to use generic org.springframework.messaging.Message too.
When I'm using KafkaItemReader<String, String> and KafkaItemWriter<String, String> and it works fine with only the payload as String but all headers are lost. When I use KafkaItemReader<String, Message<?>> and KafkaItemWriter<String, Message<?>> to also get headers, I have a ClassCastException: java.lang.String cannot be cast to org.springframework.messaging.Message
2020-11-13T14:27:03.472446462+01:00 stdout F java.lang.ClassCastException: java.lang.String cannot be cast to org.springframework.messaging.Message
2020-11-13T14:27:03.472450493+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.doProcess(SimpleChunkProcessor.java:134) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.47245463+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.transform(SimpleChunkProcessor.java:319) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472457814+01:00 stdout F at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:210) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472460712+01:00 stdout F at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:77) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472463956+01:00 stdout F at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
2020-11-13T14:27:03.472468765+01:00 stdout F at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331) ~[spring-batch-core-4.2.4.RELEASE.jar:4.2.4.RELEASE]
Is there a way to do this ?
In fact it seems that there is no way to get message headers with KafkaItemReader and KafkaItemWriter. Serializer/deserializer are used for key and payload but I can't find a way to get headers.
I solved this issue by using a Tasklet instead of KafkaItemReader and KafkaItemWriter. In my Tasklet, I use KafkaConsumer and KafkaProducer to deal with ConsumerRecord and ProducerRecord which allow me to copy headers.
Moreover I can handle commit more properly (no auto commit) : consumer offsets are committed only if the messages are sent by the producer.

Quarkus syslog message format

I am usung quarkus to log some events from application to elasticsearch as syslog, look at
quarkus.log.syslog.enable=true
quarkus.log.syslog.endpoint=elkhost:7001
quarkus.log.syslog.protocol=udp
quarkus.log.syslog.use-counting-framing=false
quarkus.log.syslog.app-name=MYAPP
quarkus.log.syslog.hostname=MYHOST
quarkus.log.syslog.level=ALL
quarkus.log.syslog.format=%m%n
Notice %m is only thing to log, no another data. RFC format is default. And from Kabana i see
<14>1 2020-02-29T11:43:06.001+03:00 MYHOST MYAPP 9348 test - test messagee
How must I set quarkus logging to write ONLY message sent? Only "test massage" without any text left of message.
Reading the Syslog RFC 5424, this format is not customizable in the way you expect.
Syslog message is composed from mandatory parts :
PRI
HEAD
MSG
So you can not reduce a syslog message to its MSG part only.
Quarkus is not the restriction here but RFC 5424.

MQ decoding cuts off dots (...) and changes the message length

We are using IBM MQ Series 9 and we are facing a decoding problem.
The messages are being sent from a mainframe with an encoding of 424 (Hebrew) to a Windows-based system. The system pulls the messages out of the queue and parses the messages, and after that, cuts the messages in different parts for advanced parsing.
All messages might include Hebrew characters, hence I am obligated to use Hebrew encoding.
A message in the MQ can look like this:
9921388ABC.........3323DDFF.....43332FFF...2321......
After reading the message and parsing it using different code pages the message either doesn't reach the system (using 424, 916) or reaches the system but looks like this:
9921388ABC3323DDFF43332FFF2321
The messages are shorter and are unparseable.
I ahve tried to consult with our MQ people but they are clueless about this problem.
Would very appreciate any kind of help.
Thank you.

mirth connect stop message propagation through destinations

I am using Mirth Connect 3.5.0.8232.
I have a Database Reader as source connector and a JavaScript writer as destination connector. I decided to put some fancy code in the destination, doing four separate things, which should follow one after the other. Basically I just wrote the code and it seemed to me that it was too long and too clumsy, so I decided to split it into 4 destinations that would be daisy-chained, via the "Wait for previous destination" option.
The question is : How do I interrupt this chain of execution if an error occurs on one of the destinations?
I found a JIRA issue from 2013 saying that actually the errors that would occur in the body of the Destination Connector would not prevent the message from going to all other Destinations. And it states that the 2.X version behavior is still current, i.e. an error that would occur in the Destination Transformer, will actually stop the message from propagating.
I tried throwing errors in both the Destination body, and in Destination Response Transformer, and in both cases the message would continue to other Destinations. I also tried returning ResponseFactory.getErrorResponse from the Destination body with no luck. I also tried setting responseStatus to ERROR in Destination Response Transformer to no avail. Did they mean the normal Transformer/Filter?
Also - maybe my particular solution of splitting a task into 4 distinct destinations was NOT the reason why the destinations were created in the first place? I think that the documentation states that destinations are basically what the actual word Destination stands for.
If the above case is true, maybe there are better ways of organizing the code functionally in Mirth? I think including external JS files is not allowed in JavaScript writer - even if it were, i would prefer everything to sit inside the Channel itself and be exportable/importable as a single file.
Thank you.
Yep, when an error is thrown from a filter/transformer, it's considered truly "exceptional" and so message flow is stopped (subsequent destinations in the same chain are not executed).
If an error is thrown from the actual destination dispatcher or from the response transformer, that destination is marked as ERROR, but subsequent destinations will still be executed.
You can still stop the message flow if you want though. Use filters on your subsequent destinations:

syslog-ng with unix-stream destination

I am trying to configure syslog-ng destination path to use unix-stream sockets for Inter process communication. I have gone throgh this documentation http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.3-guides/en/syslog-ng-ose-v3.3-guide-admin-en/html/configuring_destinations_unixstream.html .
My syslog.conf(only part of it) for the same is as follows:
source s_dxtcp { tcp(ip(0.0.0.0) port(514)); };
filter f_request {program("dxall");};
destination d_dxall_unixstream {unix-stream("/var/run/logs/all.log");};
log {source(s_dxtcp); filter(f_request); destination(d_dxall_unixstream);};
When I restart my syslog-ng server, I have got the following message:
Connection failed; fd='11', server='AF_UNIX(/var/run/logs/all.log)',
local='AF_UNIX(anonymous)', error='Connection refused (111)'
Initiating connection failed, reconnecting; time_reopen='60'
What this error signifies? How can I use unix sockets with syslog-ng? Could any one help me out.
Till now I am not able to create a Unix Domain Socket for inter process communication. But I got a way around it. All I want is a one way communication to send data created at syslog-ng to a running java program(a process, I can say). This I achieved with Using Named Pipes in Syslog-ng. Documents for achieving is http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.4-guides/en/syslog-ng-ose-v3.4-guide-admin/html-single/index.html#configuring-destinations-pipe .
Reading from Named Pipe is same as reading from a normal file. One important point to note is that Reader process(here the Java program) should be started before Syslog-ng, (Writer, that writes log messages to the Named pipe).
Reason, Writer will block until there is a Reader. Absence of Reader will lead to loss of some messages, that got accumulated before Reader Started. And there should be only one instance of Reader. If there are multiple readers, the second reader will get null pointer exception, as the message it want to read is already read by the first Reader. Kindly note that this is from my experience. Let me know, If I am wrong.