Debezium Server and using variables in the application.properties file - debezium

I'm trying to get Debezium Server running so that I can use GCP (Google) PubSub, and not have to use Kafka and the Kafka connectors. I have it mostly running, however, I'm having trouble with the using variables in the tranforms section to define a Topic name.
According to the documentation, when using the Outbox transformation, I can choose the Topic name by using the variable ${routedByValue} for the setting route.topic.replacement and this will use the value that is determined by the setting route.by.field. If the replacement setting is omitted, it will use a default topic name of outbox.event.<route.by.field value>.
When I try to use this variable in the 'application.properties' file ...
debezium.transforms.outbox.route.by.field=aggregate_type
debezium.transforms.outbox.route.topic.replace=${routedByValue}
... the Debezium Server stops with a NoSuchElementException, saying it cannot expand routedByValue. If I omit that setting, it works fine and defines the topic name as outbox.event.<route.by.field value>.
How can I use this variable correctly in the 'applications.properties' file so I can customise the topic name (e.g. route.topic.replace=myservice.${routedByValue})?

The way I got this to work was to do the following ...
debezium.transforms.outbox.route.by.file=aggregate_type
debezium.transforms.outbox.route.topic.replacement=$1
I believe this works because omitted from the config is another setting - debezium.transforms.outbox.route.topic.regex - and this has a default value of - (?<routedByValue>.*).
If I understand the documentation correctly, the $1 refers to the first group in the regex expression. In my case, this will return whatever the value of aggregrate_type equates to.

I'm using Debezium Server 2.1 with Pulsar as sink type and the #Dazfl answer solve my issue !
debezium.transforms=outbox
debezium.transforms.outbox.type=io.debezium.transforms.outbox.EventRouter
debezium.transforms.outbox.route.topic.replacement=outbox.event.transactions.$1
Although the Debezium Server docs says to use $routedByValue, this do not works as expeceted...

Related

Subject does not have subject-level compatibility configured

We use Kafka, Kafka connect and Schema-registry in our stack. Version is 2.8.1(Confluent 6.2.1).
We use Kafka connect's configs(key.converter and value.converter) with value: io.confluent.connect.avro.AvroConverter.
It registers a new schema for topics automatically. But there's an issue, AvroConverter doesn't specify subject-level compatibility for a new schema
and the error appears when we are trying to get config for the schema via REST API /config: Subject 'schema-value' does not have subject-level compatibility configured
If we specify the request parameter defaultToGlobal then global compatibility is returned. But it doesn't work for us because we cannot specify it in the request. We are using 3rd party UI: AKHQ.
How can I specify subject-level compatibility when registering a new schema via AvroConverter?
Last I checked, the only properties that can be provided to any of the Avro serializer configs that affect the Registry HTTP client are the url, whether to auto-register, and whether to use the latest schema version.
There's no property (or even method call) that sets either the subject level or global config during schema registration
You're welcome to check out the source code to verify this
But it doesn't work for us because we cannot specify it in the request. We are using 3rd party UI: AKHQ
Doesn't sound like a Connect problem. Create a PR for AKHQ project to fix the request
As of 2021-10-26, I used akhq 0.18.0 jar and confluent-6.2.0, the schema registry in akhq is working fine.
Note: I also used confluent-6.2.1, seeing exactly the same error. So, you may want to switch back to 6.2.0 to give a try.
P.S: using all only for my local dev env, VirtualBox, Ubuntu.
#OneCricketeer is correct.
There is no possibility to specify subject-level compatibility in AvroConverter unfortunately.
I see only two solutions:
Override AvroConverter to add property and functionality to send an additional request to API /config/{subject} after registering the schema.
Contribute to AKHQ to support defaultToGlobal parameter. But in this case, we also need to backport schema-registry RestClient. Github issue
The second solution is more preferable till the user would specify the compatibility level in the settings of the converter. Without this setting in the native AvroConverter, we have to use the custom converter for every client who writes a schema. And it makes a lot of effort.
For me, it looks strange why the client cannot set up the compatibility at the moment of registering the schema and has to use a different request for it.

Kafka to Snowflake connection issue

I am trying to connect from local standalone Confluent Kafka topics to Snowflake tables. I am using the following connector config via ksqldb.
CREATE SINK CONNECTOR `snowflake_sink` WITH(
"name"='snowflake_sink',
"tasks.max"='1',
"connector.class"='com.snowflake.kafka.connector.SnowflakeSinkConnector',
"topics"='USERPROFILE',
"snowflake.url.name"='https://mybu.mycompany.us-east-1.aws.privatelink.snowflakecomputing.com',
"snowflake.user.name”=‘myuser’,
"snowflake.database.name"='DEMO_DB',
"snowflake.topic2table.map"='USERPROFILE:UK_SF_DB1_Table1',
"snowflake.schema.name"='PUBLIC',
"snowflake.private.key”=‘<valid private key>’,
"snowflake.private.key.passphrase”=‘<valid pass phrase>’,
"key.converter"='org.apache.kafka.connect.storage.StringConverter',
"value.converter"='com.snowflake.kafka.connector.records.SnowflakeJsonConverter',
"key.converter.schema.registry.url"='http://schema-registry:8081',
"value.converter.schema.registry.url"='http://schema-registry:8081');
Ksqldb throws error saying snowflake url, username, private key is not valid. They are all valid as I can login using the same via Snowflake web ui.
Some ideas on troubleshooting this:
it looks like you have some "smart quotes" in your config, please remove the smart single and double quotes and replace them with proper text quotes. This may have happened during your preparing of this question, so just double check them.
set your table name to all capital letters.
your snowflake account should not include the https:// prefix. Examples at these links:
https://community.snowflake.com/s/article/Docker-Compose-Setting-up-Kafka-using-Snowflake-Sink-Connector-and-testing-Streams
https://docs.snowflake.com/en/user-guide/kafka-connector-install.html
You should remove the smart quotes and use the normal one like
'snowflake.url.name'='my.domain.com:443',

Kafka Connect - Missing Text

Kafka Version : 2.12-2.1.1
I created a very simple example to create a source and sink connector by using following commands :
bin\windows\connect-standalone.bat config\connect-standalone.properties config\connect-file-source.properties config\connect-file-sink.properties
Source file name : text_2.txt
Sink file name : test.sink_2.txt
A topic named "connect-test-2" is used and I created a consumer in PowerShell to show the result.
It works perfect at the first time. However, after i reboot my machine and start everything again. I found that some text are missing.
For example, when I type the characters below into test_2.txt file and save as following:
HAHAHA..
missing again
some text are missing
I am able to enter text
first letter is missing
testing testing.
The result windows (Consumer) and the sink file shows the following:
As you can see, some text are missing and i cannot find out why this happen. Any advice?
[Added information below]
connect-file-source.properties
name=local-file-source
connector.class=FileStreamSource
tasks.max=1
file=test_2.txt
topic=connect-test-2
connect-file-sink.properties
name=local-file-sink
connector.class=FileStreamSink
tasks.max=1
file=test.sink_2.txt
topics=connect-test-2
I think the strange behaviour is caused be the way you are modifying the sink file (text_2.txt).
How you applied changes after stoping the connector:
Using some editor <-- I think you use that method
Append only new characters to the end of file
FileStreamSource track changes based on the position in the file. You are using Kafka Connect in standalone mode so current position is written in /tmp/connect.offsets file.
If you modify source file using the editor, the whole content of the file has been changed. However FileStreamSource checks only if the size has change and poll characters, which offsets in the file is bigger then last processed by the Connector.
You should modify source file only by appending new characters to the end of the file

Getting null.txt file when using Cygnus HDFS sink

I'm using Cygnus 0.5 with default configuration for HDFS sink. In order make it to run, I have deactivated the "ds" interceptor (otherwise I get an error at start time that precludes Cygnus to start, related with not finding the matching table file).
Cygnus seems to work, but the file in which entity information is stored in HDFS gets a weird name: "null.txt". How can I fix this?
First of all do no deactivate the DestinationExtractor interceptor. This is the piece of code infering the destination the context data notified by Orion is going to be persisted. Please observe the destination may refer to a HDFS file name, a MySQL table name or a CKAN resource name, it depends on the sinks you have configured. Once infered, the destination is added to the internal Flume event as a header called destination in order the sinks know where to persist. Thus, if deactivated, such a header is not found by the sinks and a null name is used as the destination name.
Regarding the "matching table file not found" problem you experienced (and which leaded you yo deactivate the Interceptor), it was due to the Cygnus configuration template had a bad default value for the cygnusagent.sources.http-source.interceptors.de.matching_table parameter. This has been solved in Cygnus 0.5.1.
A workaround while Cygnus 0.5.1 gets released is:
Do no deactivate the DestinatonExtractor (as #frb says in his answer)
Create an empty matching table file and use it for the matching_table configuration, i.e.: touch /tmp/dummy_table.conf then set in the cygnus configuration file: cygnusagent.sources.http-source.interceptors.de.matching_table = /tmp/dummy_table.conf

How to make ActiveMQ transportConnector property environmentaly-dependent

I'm looking for a way to replace this on my ActiveMQ config:
<transportConnector uri="tcp://localhost:60019"> disableAsyncDispatch="false"/>
with a "not-hardcoded" URI (e.g., replacing "localhost" with a variable that resolves to an instance dependent value). The problem is that as we have many JBoss instances per server, and that URI above resolves to 0.0.0.0:60019, only one instance at a time can be running, unless we configure it in a per-application basis, which is not only frustrating, but there are circumstances where it is not enough (should be per-instance based, which is much more frustrating).
Each JBoss server has its own IP address, so I thought of using ${jboss.bind.address} to circumvent this, but it won't syntax. We also have an environment variable %SERVERIP% which could be used for this calling it from a start up script, but I don't know if ActiveMQ reads an environment variable for assigning its transport connector URI.
Any help would be much appreciated.
Use a PropertyPlaceHolderConfigurer and you should be able to replace the uri with some ${variable} from file or from jvm system variable. This should work since ActiveMQ configuration is really just a Spring context.