Loading CSV data into Kafka

Loading CSV data into Kafka - apache-kafka

I was working on event monitoring / microservice monitoring with kafka.
Following a guide from https://rmoff.net/2020/06/17/loading-csv-data-into-kafka/
When i was at the area
kafkacat -b kafka:29092 -t orders_spooldir_00 \
-C -o-1 -J \
-s key=s -s value=avro -r http://schema-registry:8081 | \
jq '.payload'
I got an error, not sure what went wrong, docker end or the server end?
Would appreciate any helps on how I can proceed! thanks
% ERROR: Failed to query metadata for topic orders_spooldir_00: Local: Broker transport failure
kafka-connect | [2021-11-09 16:02:35,778] ERROR [source-csv-spooldir-00|worker] WorkerConnector{id=source-csv-spooldir-00} Error while starting connector (org.apache.kafka.connect.runtime.WorkerConnector:193)

Related

Kafka: Source-Connector to Topic mapping is Flakey

I have the following Kafka connector configuration (below). I have created the "member" topic already (30 partitions). The problem is that I will install the connector and it will work; i.e.
curl -d "#mobiledb-member.json" -H "Content-Type: application/json" -X PUT https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/config
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/member-connector/topics
returns:
{"member-connector":{"topics":[member]}}
the status call returns no errors:
curl -s https://TTGSSQA0VRHAP81.ttgtpmg.net:8085/connectors/mobiledb-member-connector/status
{"name":"member-connector","connector":{"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"ttgssqa0vrhap81.***.net:8085"}],"type":"source"}
... but at other times, I will install a similar connector config and it will return no topics.
{"member-connector":{"topics":[]}}
Yet the status shows no errors and the Connector logs show no clues as to why this "connector to topic" mapping isn't working. Why aren't the logs helping out?
Connector configuration.
{
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"connection.url":"jdbc:sqlserver:****;",
"connection.user":"***",
"connection.password":"***",
"transforms":"createKey",
"table.poll.interval.ms":"120000",
"key.converter.schemas.enable":"false",
"value.converter.schemas.enable":"false",
"poll.interval.ms":"5000",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"name":"member-connector",
"tasks.max":"4",
"query":"SELECT * FROM member_kafka_test",
"table.types":"TABLE",
"topic.prefix":"member",
"mode":"timestamp+incrementing",
"transforms.createKey.fields":"member_id",
"incrementing.column.name": "member_id",
"timestamp.column.name" : "update_datetime"
}

Confluent Cloud -> BigQuery - How to diagnose "Bad records" cause

I could push the data from MSSql Server to Topics on Confluent Cloud,but not from topics to BigQuery, it throws an error "Bad records in the last hour - 65"
I could able to connect the topics to bigQuery but not able to ingest the data.
MSSQL and BigQuery table format are the same
first(string) last(string)
raj ram
Do I need to add any other columns to ingest data such as timestamp, offset,etc.?

If there are messages that can't be sent to the target they'll get written to a Dead Letter Queue with details of the problem.
From the Connectors screen you can see the ID of your connector
Use that id to locate a topic with the same name and a dlq- prefix.
You can then browse the topic and use the header information to determine the cause of the problem
If you prefer you can use kafkacat to view the headers:
$ docker run --rm edenhill/kafkacat:1.5.0 \
-X security.protocol=SASL_SSL -X sasl.mechanisms=PLAIN \
-X ssl.ca.location=./etc/ssl/cert.pem -X api.version.request=true \
-b ${CCLOUD_BROKER_HOST} \
-X sasl.username="${CCLOUD_API_KEY}" \
-X sasl.password="${CCLOUD_API_SECRET}" \
-t dlq-lcc-emj3x \
-C -c1 -o beginning \
-f 'Topic %t[%p], offset: %o, Headers: %h'
Topic dlq-lcc-emj3x[0], offset: 12006, Headers: __connect.errors.topic=mysql-01-asgard.demo.transactions,__connect.errors.partition=5,__connect.errors.offset=90,__connect.errors.connector.name=lcc-emj3x,__connect.errors.task.id=0,__connect.errors.stage=VALUE_CONVERTER,__connect.errors.class.name=org.apache.kafka.connect.json.JsonConverter,__connect.errors.exception.class.name=org.apache.kafka.connect.errors.DataException,__connect.errors.exception.message=Converting byte[] to Kafka Connect data failed due to serialization error: ,__connect.errors.exception.stacktrace=org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error:
at org.apache.kafka.connect.json.JsonConverter.toConnectData(JsonConverter.java:344)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:487)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:464)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:320)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:177)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:227)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Illegal character ((CTRL-CHAR, code 0)): only regular white space (\r, \n, \t) is allowed between tokens
at [Source: (byte[])"
From there on in it's just a case of understanding the error. A lot of the time it's down to serialisation issues, which you can learn more about here.

KSQL Stream output Topic

Hi i have Left Join Ksql Stream ( SEARCHREQUESTDTO) with a Ksql Table (NGINX_TABLE). with following ksql command
CREATE STREAM NIGINX_SEARCH_QUERY AS \
SELECT *\
FROM SEARCHREQUESTDTO\
LEFT JOIN NGINX_TABLE\
ON SEARCHREQUESTDTO.sessionid = NGINX_TABLE.sessionid;
Resulting Stream NIGINX_SEARCH_QUERY successfully. also i can see NIGINX_SEARCH_QUERY topic using show topic command in Ksql terminal.
when i try to connect kafka consumer to this topic consumer is not able to fetch any data.
but print NIGINX_SEARCH_QUERY command showing data is publishing in this topic.

If PRINT shows output then the topic does exist and has data.
If your consumer doesn't show output then that's an error with your consumer. So I would rephrase your question as, I have a Kafka topic that my Consumer does not show data for.
I would use kafkacat to check the topic externally:
kafkacat -b kafka-broker:9092 -C -K: \
-f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\n\Partition: %p\tOffset: %o\n--\n' \
-t NIGINX_SEARCH_QUERY

rsyslog doesn't write log to kafka

I'm new with rsyslog and kafka, and get some trouble when trying to get the following log stream worked.
nginx log -> rsyslog-imudp -> rsyslog-omkafka -> kafka
Here is nginx conf
log_format jsonlog '{'
'"host": "$host",'
'"server_addr": "$server_addr",'
'"http_x_forwarded_for":"$http_x_forwarded_for",'
'"remote_addr":"$remote_addr",'
'"time_local":"$time_local",'
'"request_method":"$request_method",'
'"request_uri":"$request_uri",'
'"status":$status,'
'"body_bytes_sent":$body_bytes_sent,'
'"http_referer":"$http_referer",'
'"http_user_agent":"$http_user_agent",'
'"upstream_addr":"$upstream_addr",'
'"upstream_status":"$upstream_status",'
'"upstream_response_time":"$upstream_response_time",'
'"request_time":$request_time'
'}';
access_log syslog:server=server_ip,facility=local7,tag=nginx_access_log jsonlog;
And rsyslog conf
module(load="imudp")
input(type="imudp" port="514")
module(load="omkafka")
template(name="nginxLog" type="string" string="%msg%")
if $inputname == "imudp"then {
action(type="omkafka"
template="nginxLog"
broker=["localhost:9092"]
topic="rsyslog_logstash"
partitions.auto="on"
confParam=[
"socket.keepalive.enable=true"
]
)
}
Unluckily I don't have any output in the consumer terminal
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic rsyslog_logstash --from-beginning
Maybe it's the template, but I cannot find much documents about it.

use rsyslog-impstats module to check how many messages had your imudp received, and how many sent to omkafka.

For debugging stop rsyslog and run it with:
rsyslogd -nd
(I suggest run in with rsyslogd -n and check your Kafka topic again and then debug it)
Probably you have had problem with SELinux that can be resolved with this:
sudo semanage port -d -t unreserved_port_t -p tcp 9092
sudo semanage port -a -t http_port_t -p tcp 9092

Terminate Kafka Console Consumer when all the messages have been read

I know there has to be a way to do this, but I am not able to figure this out. I need to stop the kafka consumer once I have read all the messages from the queue.
Can somebody provide any info on this?

You can pass parameter: -consumer-timeout-ms with a value when starting the consumer and it will throw an exception if no messages have been read during that time. For example, to stop the consumer if no new messages have arrived in the last 2 seconds:
kafka.consumer.ConsoleConsumer -consumer-timeout-ms 2000
You can see this and all the other input options here

Currently, Kafka version 2.11-2.1.1 has a script called kafka-console-consumer.sh.
It has a new flag: --timeout-ms.
Basically, this flag is the maximum time to wait before exiting when there is no new log to wait. It's in millisecond term.
You can use this property to end you console consumer after reading all the messages.

You can use SimpleConsumerShell with no-wait-at-logend option. See SystemTools-SimpleConsumerShell
For example:
./kafka-run-class.bat kafka.tools.SimpleConsumerShell --broker-list localhost:9092 --topic kafkademo --partition 0 --no-wait-at-logend

If you are not dead set on using the Scala client, try kafkacat with the -e option telling it to exit when end of partition has been reached.
E.g. to consume all messages from mytopic partition 2 and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o beginning -e
Or consume the last 3000 messages and then exit:
$ kafkacat -b mybroker -t mytopic -p 2 -o -3000 -e

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Loading CSV data into Kafka - apache-kafka

Related

Kafka: Source-Connector to Topic mapping is Flakey

Confluent Cloud -> BigQuery - How to diagnose "Bad records" cause

KSQL Stream output Topic

rsyslog doesn't write log to kafka

Terminate Kafka Console Consumer when all the messages have been read

Categories

Resources