Insert data into redis from kafka via redis-sink-connector by splitting the record key - apache-kafka

Trying to find some reference to store kafka records into redis via Kafka-Redis-Sink connector by splitting records keys in following way
Example of Kafka records: \
<`key`> <`value`> ========> `expected redis operation` \
\<User1|state1|id1\> \<data\> ========\> redis=\> hset "User1|state1" "id1" data \
\<User1|state2|id2\> \<data\> ========\> redis=\> hset "User1|state2" "id2" data \
\<User2|state1|id3\> \<data\> ========\> redis=\> hset "User2|state1" "id3" data \
\<User2|state2|id4\> \<data\> ========\> redis=\> hset "User2|state2" "id4" data
So hash for user-state mapping will have all mapped ids and using HGETALL in redis for User1|state1 all the ids and data can be retrieved.
`HGETALL` `"User1|state1"`
id1
data
id2
data

Related

Kafka Connect error: com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record

I am trying to ingest data from kafka into aerospike. What am I missing in the kafka message being sent?
I am sending below data into kafka for pushing into aerospike:
ubuntu#ubuntu-VirtualBox:/opt/kafka_2.13-2.8.1$ bin/kafka-console-producer.sh --topic phone --bootstrap-server localhost:9092
>{"schema":{"type":"struct","optional":false,"version":1,"fields":[{"field":"name","type":"string","optional":true}]},"payload":{"name":"Anuj"}}
Kafka connect gives the below error:
com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record
[2021-12-13 21:33:34,747] ERROR failed to put record SinkRecord{kafkaOffset=13, timestampType=CreateTime} ConnectRecord{topic='phone', kafkaPartition=0, key=null, keySchema=null, value=Struct{name=Anuj}, valueSchema=Schema{STRUCT}, timestamp=1639411413702, headers=ConnectHeaders(headers=)} (com.aerospike.connect.kafka.inbound.AerospikeSinkTask:288)
com.aerospike.connect.inbound.aerospike.exception.ConvertToAerospikeException: user key missing from record
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractUserKey(AerospikeRecordConverter.kt:131)
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractKey(AerospikeRecordConverter.kt:68)
at com.aerospike.connect.inbound.converter.AerospikeRecordConverter.extractRecord(AerospikeRecordConverter.kt:41)
at com.aerospike.connect.kafka.inbound.KafkaInboundDefaultMessageTransformer.transform(KafkaInboundDefaultMessageTransformer.kt:69)
at com.aerospike.connect.kafka.inbound.KafkaInboundDefaultMessageTransformer.transform(KafkaInboundDefaultMessageTransformer.kt:25)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.applyTransform(AerospikeSinkTask.kt:341)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.toAerospikeOperation(AerospikeSinkTask.kt:315)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.putRecord(AerospikeSinkTask.kt:239)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask.access$putRecord(AerospikeSinkTask.kt:47)
at com.aerospike.connect.kafka.inbound.AerospikeSinkTask$put$2$2.invokeSuspend(AerospikeSinkTask.kt:220)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2021-12-13 21:33:35,458] INFO 1 errors for topic phone (com.aerospike.connect.kafka.inbound.AerospikeSinkTask:552)
aerospike-kafka-inbound.yml file:
GNU nano 4.8 /home/ubuntu/ib-zipaerospikesink/aerospike-kafka-inbound-2.2.0/lib/etc/aerospike-kafka-inbound/aerospike-kafka-inbound.yml
# Change the configuration for your use case.
#
# Refer to https://www.aerospike.com/docs/connect/streaming-to-asdb/from-kafka-to-asdb-overview.html
# for details.
# Map of Kafka topic name to its configuration.
topics:
phone: # Kafka topic name.
invalid-record: ignore # not Kill task on invalid record.
mapping: # Config to convert Kafka record to Aerospike record.
namespace: # Aerospike record namespace config.
mode: static
value: test
set: # Aerospike record set config.
mode: static
value: t1
key-field: # Aerospike record key config.
source: key # Use Kafka record key as the Aerospike record key.
bins: # Aerospike record bins config.
type: multi-bins
# all-value-fields: true # Convert all values in Kafka record to Aerospike record bins.
map:
name:
source: value-field
field-name: firstName
# The Aerospike cluster connection properties.
aerospike:
seeds:
- 127.0.0.1:
port: 3000
It looks like you are not specifying a key when you are sending your kafka message. By default Kafka sends a null key and your config says to use the kafka key as the aerospike key. In order to send a kafka key you need to set parse.key to true and specify what your separator will be (in the kafka producer).
see step 8 here
https://kafka-tutorials.confluent.io/kafka-console-consumer-producer-basics/kafka.html
kafka-console-producer \
--topic orders \
--bootstrap-server broker:9092 \
--property parse.key=true \
--property key.separator=":"
The two properties tell the kafka producer to expect a key in your messages and a separator to tell the key from the value.
In this example there are two records one with the key foo and the other with fun.
foo:bar
fun:programming
This will result in those two records being written to aerospike with the primary keys matching the kafka keys foo and fun.

How does JDBC sink connector inserts values into postgres database

I'm using JDBC sink connector to load data from kafka topic to postgres database.
here is my configuration:
curl --location --request PUT 'http://localhost:8083/connectors/sink_1/config' \
--header 'Content-Type: application/json' \
--data-raw '{
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"connection.url":"jdbc:postgresql://localhost:5432/postgres",
"connection.user":"user",
"connection.password":"passwd",
"tasks.max" : "10",
"topics":"<topic_name_same_as_tablename>",
"insert.mode":"insert",
"key.converter":"org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"quote.sql.identifiers":"never",
"errors.tolerance":"all",
"errors.deadletterqueue.topic.name":"failed_records",
"errors.deadletterqueue.topic.replication.factor":"1",
"errors.log.enable":"true"
}'
In my table, I have 100k+ records so, I tried partitioning the topic into 10 and I tried with tasks.max with 10 to speed up the loading process, which was much faster when compared to single partition.
Can someone help me understand how the sink connector loads data into postgres? How will be the insert statement it will consider? either approach-1 or approach-2? If it is approach-1 then can we achieve approach-2? if yes, how can we?

Databricks: Azure Queue Storage structured streaming key not found error

I am trying to write ETL pipeline for AQS streaming data. Here is my code
CONN_STR = dbutils.secrets.get(scope="kvscope", key = "AZURE-STORAGE-CONN-STR")
schema = StructType([
StructField("id", IntegerType()),
StructField("parkingId", IntegerType()),
StructField("capacity", IntegerType()),
StructField("freePlaces", IntegerType()),
StructField("insertTime", TimestampType())
])
stream = spark.readStream \
.format("abs-aqs") \
.option("fileFormat", "json") \
.option("queueName", "freeparkingplaces") \
.option("connectionString", CONN_STR) \
.schema(schema) \
.load()
display(stream)
When I run this I am getting java.util.NoSuchElementException: key not found: eventType
Here is how my queue looks like
Can you spot and explain me what is the problem?
The abs-aqs connector isn’t for consumption of data from AQS, but it’s for getting data about new files in the blob storage using events reported to AQS. That’s why you’re specifying the the file format option, and schema - but these parameters will be applied to the files, not messages in AQS.
As far as I know (I could be wrong), there is no Spark connector for AQS, and it’s usually recommended to use EventHubs or Kafka as messaging solution.

Should topic created via kafka-topics automatically have associated subjects created?

I'm attempting to mimic 'confluent load' (which isn't recommended for production usage) to add the connectors which automatically creates the topic, subject, etc. that allows for ksql stream and table creation. I'm using curl to interact with the rest interface.
When kafka-topics is used to create topics, does this also create the associated subjects for "topicName-value", etc.?
$ curl -X GET http://localhost:8082/topics | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 199 100 199 0 0 14930 0 --:--:-- --:--:-- --:--:-- 15307
[
"Topic_OracleSource2"
]
curl -X GET http://localhost:8081/subjects | jq
[]
Nothing shows. However, performing a curl:
curl -X POST -H "Content-Type: application/vnd.kafka.avro.v2+json" -H "Accept: application/vnd.kafka.v2+json" --data '{"value_schema": "{\"type\": \"record\", \"name\": \"User\", \"fields\": [{\"name\": \"name\", \"type\": \"string\"}]}", "records": [{"value": {"name": "testUser"}}]}' "http://localhost:8082/topics/avrotest"
creates the subject:
curl -X GET http://localhost:8081/subjects | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 18 100 18 0 0 2020 0 --:--:-- --:--:-- --:--:-- 2250
[
"avrotest-value"
]
As far as I know, doing it this way isn't recommended as topics are created on the fly and not pre-created in a controlled environment.
The reason this question comes about is that it seems the subject 'topicName-value/key' pair is needed to create streams for the topic inside KSQL.
Without subject, I can only see data coming across with the avro-based connector created but can't further perform transformation using ksql stream and table.
kafka-topics only interacts with Zookeeper and Kafka. It has no notion of the existence of a Schema Registry.
The process that creates the Avro schema / subject is the Avro Serializer configuration via the producer. If a Kafka Connect source is configured with the AvroConverter, it'll register a schema itself upon getting data, so you should not need curl, assuming you are satisfied with the generated schema
To my knowledge, there's no way to prevent KSQL from auto-registering a schema in the registry.
seems the subject 'topicName-value/key' pair is needed to create streams for the topic inside KSQL.
If you want to use Avro, yes. But, no not "needed" for other data formats KSQL supports
can't further perform transformation using ksql stream and table.
You'll need to be more explicit about why that is. Are you getting errors?
When kafka-topics is used to create topics, does this also create the associated subjects for "topicName-value", etc.?
No, the subjects are not created automatically. (kafka-topics today doesn't even allow you to pass an Avro schema.)
Might be worth a feature request?

KSQL Stream output Topic

Hi i have Left Join Ksql Stream ( SEARCHREQUESTDTO) with a Ksql Table (NGINX_TABLE). with following ksql command
CREATE STREAM NIGINX_SEARCH_QUERY AS \
SELECT *\
FROM SEARCHREQUESTDTO\
LEFT JOIN NGINX_TABLE\
ON SEARCHREQUESTDTO.sessionid = NGINX_TABLE.sessionid;
Resulting Stream NIGINX_SEARCH_QUERY successfully. also i can see NIGINX_SEARCH_QUERY topic using show topic command in Ksql terminal.
when i try to connect kafka consumer to this topic consumer is not able to fetch any data.
but print NIGINX_SEARCH_QUERY command showing data is publishing in this topic.
If PRINT shows output then the topic does exist and has data.
If your consumer doesn't show output then that's an error with your consumer. So I would rephrase your question as, I have a Kafka topic that my Consumer does not show data for.
I would use kafkacat to check the topic externally:
kafkacat -b kafka-broker:9092 -C -K: \
-f '\nKey (%K bytes): %k\t\nValue (%S bytes): %s\n\Partition: %p\tOffset: %o\n--\n' \
-t NIGINX_SEARCH_QUERY