what does the `port` mean in kafka zookeeper path `/brokers/ids/$id` - apache-kafka

I got two kafka listeners with config
listeners=PUBLIC_SASL://0.0.0.0:5011,PUBLIC_PLAIN://0.0.0.0:5010
advertised.listeners=PUBLIC_SASL://192.168.181.2:5011,PUBLIC_PLAIN://192.168.181.2:5010
listener.security.protocol.map=PUBLIC_SASL:SASL_PLAINTEXT,PUBLIC_PLAIN:PLAINTEXT
inter.broker.listener.name=PUBLIC_SASL
5010 is plaintext, 5011 is sasl_plaintext.
After startup, I found this information in zookeeper(/brokers/ids/$id):
{
"listener_security_protocol_map": {
"PUBLIC_SASL": "SASL_PLAINTEXT",
"PUBLIC_PLAIN": "PLAINTEXT"
},
"endpoints": [
"PUBLIC_SASL://192.168.181.2:5011",
"PUBLIC_PLAIN://192.168.181.2:5010"
],
"jmx_port": -1,
"features": { },
"host": "192.168.181.2",
"timestamp": "1658485899402",
"port": 5010,
"version": 5
}
What does the port filed mean? Why the port is 5010? Could I change it to 5011?

What you're seeing are advertised.port and advertised.host Kafka settings, which may be parsed from the advertised.listener list for backward compatibility, but both of these are deprecated, however, and the Kafka protocol now uses the protocol map and corresponding endpoints list, instead.

Related

ActiveMQ Artemis how to filter message by part of the "text" field

Is there any way to filter messages in ActiveMQ Artemis 2.10.0 by part of the "text" field using the management console?
I use method "browse(java.lang.String)" and try to filter my message (example below) by this expression:
text LIKE '%777-555-333-111%'
Message example:
{
"address": "ADDRESS.EXAMPLE",
"ShortProperties": {},
"messageID": "11111",
"priority": 4,
"type": 3,
"redelivered": false,
"ByteProperties": {
"_AMQ_ROUTING_TYPE": 1
},
"IntProperties": {
"CamelHttpResponseCode": 200
},
"durable": true,
"StringProperties": {
"Server": "nginx\/1.19.5",
"CamelHttpCharacterEncoding": "UTF-8",
"Content_HYPHEN_Type": "application\/xop+xml",
"connection": "keep-alive"
},
"DoubleProperties": {},
"expiration": 0,
"text": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><processId>777-555-333-111<\/processId><\/error>",
"BooleanProperties": {},
"FloatProperties": {}
}
However, it doesn't give me any results.
Would be grateful for a hint if it possible on my current Artemis version.
The filter used by the browse management operation (as well as that used by JMS consumers, etc.) only applies to message headers and properties. You can't filter a message by the text in its body.
The data that you pasted is just the serialized message data sent to the client after the filter has already been applied.
Apache ActiveMQ Artemis also supports special XPath filters which operate on
the body of a message. The body must be XML, see the documentation for further details.
To use an XPath filter use this syntax:
XPATH '<xpath-expression>'

Debezium Connector filter "partly" working

We have a debezium connector that works without any errors. Two filtering conditions are applied and one of them works as intended but the other one seems to have no effect. These are the important parts of the config:
"connector.class": "io.debezium.connector.oracle.OracleConnector",
"transforms.filter.topic.regex": "topicname",
"database.connection.adapter": "logminer",
"transforms": "filter",
"schema.include.list": "xxxx",
"transforms.filter.type": "io.debezium.transforms.Filter",
"transforms.filter.language": "jsr223.groovy",
"tombstones.on.delete": "false",
"transforms.filter.condition": "value.op == \"c\" && value.after.QUEUELOCATIONTYPE == 5",
"table.include.list": "xxxxxx",
"skipped.operations": "u,d,r",
"snapshot.mode": "initial",
"topics": "xxxxxxx"
As you see, we want to get records which have op as "c" and "QUEUELOCATIONTYPE" as 5. In kafka topic all the records have the op field as "c". But the second condition does not work. There are records with QUEUELOCATIONTYPE as 2,3,4 etc.
A sample record is given below.
"payload": {
"before": null,
"after": {
"EVENTOBJECTID": "749dc9ea-a7aa-44c2-9af7-10574769c7db",
"QUEUECODE": "STDQSTDBKP",
"STATE": 6,
"RECORDDATE": 1638964344000,
"RECORDREQUESTOBJECTID": "32b7f617-60e8-4020-98b0-66f288433031",
"QUEUELOCATIONTYPE": 4,
"RETRYCOUNT": 0,
"RECORDCHANNELCODE": null,
"MESSAGEBROKERSERVERID": 1
},
"op": "c",
"ts_ms": 1638953572392,
"transaction": null
}
}
What may be the problem? Even though I wasn't thinking it was going to work, I've tried switching the placement of conditions. There are no error codes, connector is running.
Ok solved it. I was using a pre-created config. While reading documentations, I've seen that "skipped.operations": "u,d,r" is not an Oracle configuration. It was in the MySQL documentation. So, I deleted it and changed the connector name (cached data can cause problems so often). Looks like it's working now.

Handling empty/invalid Mqtt Messages with Kafka Connect

I am trying to ingest data from Mqtt into Kafka. Unfortunately, some of those Mqtt-Messages are either empty or invalid JSON. I assume that is what leads to the following exception:
{
"name": "source_mqtt_alarms",
"connector": {
"state": "RUNNING",
"worker_id": "-redacted-:8083"
},
"tasks": [
{
"id": 0,
"state": "FAILED",
"worker_id": "-redacted-:8083",
"trace": "org.apache.kafka.connect.errors.ConnectException:
Tolerance exceeded in error handler\n\tat org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:196)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:122)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.convertTransformedRecord(WorkerSourceTask.java:314)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:340)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:264)\n\t
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)\n\t
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)\n\t
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\t
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\t
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\t
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\t
at java.base/java.lang.Thread.run(Thread.java:834)\n
Caused by: org.apache.kafka.connect.errors.DataException: Conversion error: null value for field that is required and has no default value\n\t
at org.apache.kafka.connect.json.JsonConverter.convertToJson(JsonConverter.java:611)\n\t
at org.apache.kafka.connect.json.JsonConverter.convertToJsonWithEnvelope(JsonConverter.java:592)\n\t
at org.apache.kafka.connect.json.JsonConverter.fromConnectData(JsonConverter.java:346)\n\t
at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:63)\n\t
at org.apache.kafka.connect.runtime.WorkerSourceTask.lambda$convertTransformedRecord$2(WorkerSourceTask.java:314)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:146)\n\t
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:180)\n\t
... 11 more\n"
}
],
"type": "source"
}
From what I've learned so far, it looks like the incoming (empty/invalid) messages do not contain values that are declared as non-optional, which leads to the exception above.
My question would be, where is the connector taking that expectation from? It says "null value for field that is required and has no default value", but how is that field required if the schema is (I assume) created per message?
Additional information:
I am using the Lenses.io Stream Reactor Mqtt Source Connector. The configuration is as follows:
{
"name": "source_mqtt_alarms",
"config": {
"topics": "alarms",
"connect.mqtt.kcql": "INSERT INTO alarms SELECT * FROM `-redacted-/+/alarms` WITHCONVERTER=`com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter`",
"connect.mqtt.client.id": "kafka_connect_alarms",
"tasks.max": 1,
"connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector",
"connect.mqtt.service.quality": 2,
"connect.mqtt.hosts": "ssl://-redacted-:8883",
"connect.mqtt.ssl.ca.cert": "/usr/share/certs/cumu.crt",
"connect.mqtt.ssl.cert": "/usr/share/certs/mqtt.crt",
"connect.mqtt.ssl.key": "/usr/share/certs/mqtt.pem",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": true,
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": true,
}
}
Edit: I just went through the logs of the Kafka Connect worker and it's giving a bit more information. Prior to the exception above, I get a lost of these:
[2021-05-26 08:27:19,552] ERROR Error handling message with id:0 on topic:-redacted-/alarms (com.datamountaineer.streamreactor.connect.mqtt.source.MqttManager)
java.util.NoSuchElementException: head of empty list
at scala.collection.immutable.Nil$.head(List.scala:430)
at scala.collection.immutable.Nil$.head(List.scala:427)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter$.convert(JsonSimpleConverter.scala:76)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter$.convert(JsonSimpleConverter.scala:70)
at com.datamountaineer.streamreactor.connect.converters.source.JsonSimpleConverter.convert(JsonSimpleConverter.scala:37)
at com.datamountaineer.streamreactor.connect.mqtt.source.MqttManager.messageArrived(MqttManager.scala:110)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.deliverMessage(CommsCallback.java:514)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.handleMessage(CommsCallback.java:417)
at org.eclipse.paho.client.mqttv3.internal.CommsCallback.run(CommsCallback.java:214)
at java.base/java.lang.Thread.run(Thread.java:834)

How to migrate consumer offsets using MirrorMaker 2.0?

With Kafka 2.7.0, I am using MirroMaker 2.0 as a Kafka-connect connector to replicate all the topics from the primary Kafka cluster to the backup cluster.
All the topics are being replicated perfectly except __consumer_offsets. Below are the connect configurations:
{
"name": "test-connector",
"config": {
"connector.class": "org.apache.kafka.connect.mirror.MirrorSourceConnector",
"topics.blacklist": "some-random-topic",
"replication.policy.separator": "",
"source.cluster.alias": "",
"target.cluster.alias": "",
"exclude.internal.topics":"false",
"tasks.max": "10",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"source.cluster.bootstrap.servers": "xx.xx.xxx.xx:9094",
"target.cluster.bootstrap.servers": "yy.yy.yyy.yy:9094",
"topics": "test-topic-from-primary,primary-kafka-connect-offset,primary-kafka-connect-config,primary-kafka-connect-status,__consumer_offsets"
}
}
In a similar question here, the accepted answer says the following:
Add this in your consumer.config:
exclude.internal.topics=false
And add this in your producer.config:
client.id=__admin_client
Where do I add these in my configuration?
Here the Connector Configuration Properties does not have such property named client.id, I have set the value of exclude.internal.topics to false though.
Is there something I am missing here?
UPDATE
I learned that Kafka 2.7 and above supports automated consumer offset sync using MirrorCheckpointTask as mentioned here.
I have created a connector for this having the below configurations:
{
"name": "mirror-checkpoint-connector",
"config": {
"connector.class": "org.apache.kafka.connect.mirror.MirrorCheckpointConnector",
"sync.group.offsets.enabled": "true",
"source.cluster.alias": "",
"target.cluster.alias": "",
"exclude.internal.topics":"false",
"tasks.max": "10",
"key.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"value.converter": "org.apache.kafka.connect.converters.ByteArrayConverter",
"source.cluster.bootstrap.servers": "xx.xx.xxx.xx:9094",
"target.cluster.bootstrap.servers": "yy.yy.yyy.yy:9094",
"topics": "__consumer_offsets"
}
}
Still no help.
Is this the correct approach? Is there something needed?
you do not want to replicate connsumer_offsets. The offsets from the src to the destination cluster will not be the same for various reasons.
MirrorMaker2 provides the ability to do offset translation. It will populate the destination cluster with a translated offset generated from the src cluster. https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0
__consumer_offsets is ignored by default
topics.exclude = [.*[\-\.]internal, .*\.replica, __.*]
you'll need to override this config

Azure Service Fabric IPv6 networking issues

We are having issues deploying our Service Fabric cluster to Azure and have it handle both IPv4
and IPv6 traffic.
We are developing an application that have mobile clients on iOS and Android which communicate with
our Service Fabric cluster. The communication consist of both HTTP traffic as well as TCP Socket communication.
We need to support IPv6 in order to have Apple accept the app in their App Store.
We are using ARM template for deploying to Azure as it seems the portal does not support configuring
load balancer with IPv6 configuration for Virtual Machine Scale Sets (ref: url). The linked page also states other limitations
to the IPv6 support, such as private IPv6 addresses cannot be deployed to VM Scale Sets. However according
to this page the possibility to assign private IPv6 to VM Scale Sets is available in preview
(although this was last updated 07/14/2017).
For this question I have tried to keep this as general as possible, and based the ARM template on a template found
in this tutorial. The template is called "template_original.json" and can be downloaded from
here. This is a basic template for a service fabric cluster with no security for simplicity.
I will be linking the entire modified ARM template in the bottom of this post, but will highlight the
main modified parts first.
Public IPv4 and IPv6 addresses that are associated with the load balancer. These are associated with their respective backend pools:
"frontendIPConfigurations": [
{
"name": "LoadBalancerIPv4Config",
"properties": {
"publicIPAddress": {
"id": "[resourceId('Microsoft.Network/publicIPAddresses',concat(parameters('lbIPv4Name'),'-','0'))]"
}
}
},
{
"name": "LoadBalancerIPv6Config",
"properties": {
"publicIPAddress": {
"id": "[resourceId('Microsoft.Network/publicIPAddresses',concat(parameters('lbIPv6Name'),'-','0'))]"
}
}
}
],
"backendAddressPools": [
{
"name": "LoadBalancerIPv4BEAddressPool",
"properties": {}
},
{
"name": "LoadBalancerIPv6BEAddressPool",
"properties": {}
}
],
Load balancing rules for frontend ports on respective public IP addresses, both IPv4 and IPv6.
This amounts to four rules in total, two per front end port. I have added port 80 for HTTP here and port 5607 for Socket connection.
Note that I have updated the backend port for IPv6 port 80 to be 8081 and IPv6 port 8507 to be 8517.
{
"name": "AppPortLBRule1Ipv4",
"properties": {
"backendAddressPool": {
"id": "[variables('lbIPv4PoolID0')]"
},
"backendPort": "[parameters('loadBalancedAppPort1')]",
"enableFloatingIP": "false",
"frontendIPConfiguration": {
"id": "[variables('lbIPv4Config0')]"
},
"frontendPort": "[parameters('loadBalancedAppPort1')]",
"idleTimeoutInMinutes": "5",
"probe": {
"id": "[concat(variables('lbID0'),'/probes/AppPortProbe1')]"
},
"protocol": "tcp"
}
},
{
"name": "AppPortLBRule1Ipv6",
"properties": {
"backendAddressPool": {
"id": "[variables('lbIPv6PoolID0')]"
},
/*"backendPort": "[parameters('loadBalancedAppPort1')]",*/
"backendPort": 8081,
"enableFloatingIP": "false",
"frontendIPConfiguration": {
"id": "[variables('lbIPv6Config0')]"
},
"frontendPort": "[parameters('loadBalancedAppPort1')]",
/*"idleTimeoutInMinutes": "5",*/
"probe": {
"id": "[concat(variables('lbID0'),'/probes/AppPortProbe1')]"
},
"protocol": "tcp"
}
},
{
"name": "AppPortLBRule2Ipv4",
"properties": {
"backendAddressPool": {
"id": "[variables('lbIPv4PoolID0')]"
},
"backendPort": "[parameters('loadBalancedAppPort2')]",
"enableFloatingIP": "false",
"frontendIPConfiguration": {
"id": "[variables('lbIPv4Config0')]"
},
"frontendPort": "[parameters('loadBalancedAppPort2')]",
"idleTimeoutInMinutes": "5",
"probe": {
"id": "[concat(variables('lbID0'),'/probes/AppPortProbe2')]"
},
"protocol": "tcp"
}
},
{
"name": "AppPortLBRule2Ipv6",
"properties": {
"backendAddressPool": {
"id": "[variables('lbIPv6PoolID0')]"
},
"backendPort": 8517,
"enableFloatingIP": "false",
"frontendIPConfiguration": {
"id": "[variables('lbIPv6Config0')]"
},
"frontendPort": "[parameters('loadBalancedAppPort2')]",
/*"idleTimeoutInMinutes": "5",*/
"probe": {
"id": "[concat(variables('lbID0'),'/probes/AppPortProbe2')]"
},
"protocol": "tcp"
}
}
Also added one probe per load balancing rule, but omitted here for clarity.
The apiVerison for VM Scale set is set to "2017-03-30" per recommendation from aforementioned preview solution.
The network interface configurations are configured according to recommendations as well.
"networkInterfaceConfigurations": [
{
"name": "[concat(parameters('nicName'), '-0')]",
"properties": {
"ipConfigurations": [
{
"name": "[concat(parameters('nicName'),'-IPv4Config-',0)]",
"properties": {
"privateIPAddressVersion": "IPv4",
"loadBalancerBackendAddressPools": [
{
"id": "[variables('lbIPv4PoolID0')]"
}
],
"loadBalancerInboundNatPools": [
{
"id": "[variables('lbNatPoolID0')]"
}
],
"subnet": {
"id": "[variables('subnet0Ref')]"
}
}
},
{
"name": "[concat(parameters('nicName'),'-IPv6Config-',0)]",
"properties": {
"privateIPAddressVersion": "IPv6",
"loadBalancerBackendAddressPools": [
{
"id": "[variables('lbIPv6PoolID0')]"
}
]
}
}
],
"primary": true
}
}
]
With this template I am able to successfully deploy it to Azure. Communication using IPv4 with the
cluster works as expected, however I am unable to get any IPv6 traffic through at all. This is the
same for both ports 80 (HTTP) and 5607 (socket).
When viewing the list of backend pools for the load balancer in the Azure portal it displays the
following information message which I have been unable to find any information about. I am unsure
if this affects anything in any way?
Backend pool 'loadbalanceripv6beaddresspool' was removed from Virtual machine scale set 'Node1'. Upgrade all the instances of 'Node1' for this change to apply Node1
load balancer error message
I am not sure why I cannot get the traffic through on IPv6. It might be that there is something I
have missed in the template, or some other error on my part? If any additional information is required
dont hesitate to ask.
Here is the entire ARM template. Due to the length and post length limitations I have not embedded it, but here is a Pastebin link to the full ARM Template (Updated).
Update
Some information regarding debugging the IPv6 connectivity. I have tried slightly altering the ARM template to forward the IPv6 traffic on port 80 to backend port 8081 instead. So IPv4 is 80=>80 and IPv6 80=>8081. The ARM template has been updated (see link in previous section).
On port 80 I am running Kestrel as a stateless web server. I have the following entries in the ServiceManifest.xml:
<Endpoint Protocol="http" Name="ServiceEndpoint1" Type="Input" Port="80" />
<Endpoint Protocol="http" Name="ServiceEndpoint3" Type="Input" Port="8081" />
I have been a bit unsure specifically which addresses to listen for in Kestrel. Using FabricRuntime.GetNodeContext().IPAddressOrFQDN always returns the IPv4 address. This is currently how we start it. For debugging this I currently get ahold of all the IPv6 addresses, and hardcoded hack for port 8081 we use that address. Fort port 80 use IPAddress.IPv6Any, however this always defaults to the IPv4 address returned by FabricRuntime.GetNodeContext().IPAddressOrFQDN.
protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners()
{
var endpoints = Context.CodePackageActivationContext.GetEndpoints()
.Where(endpoint => endpoint.Protocol == EndpointProtocol.Http ||
endpoint.Protocol == EndpointProtocol.Https);
var strHostName = Dns.GetHostName();
var ipHostEntry = Dns.GetHostEntry(strHostName);
var ipv6Addresses = new List<IPAddress>();
ipv6Addresses.AddRange(ipHostEntry.AddressList.Where(
ipAddress => ipAddress.AddressFamily == AddressFamily.InterNetworkV6));
var listeners = new List<ServiceInstanceListener>();
foreach (var endpoint in endpoints)
{
var instanceListener = new ServiceInstanceListener(serviceContext =>
new KestrelCommunicationListener(
serviceContext,
(url, listener) => new WebHostBuilder().
UseKestrel(options =>
{
if (endpoint.Port == 8081 && ipv6Addresses.Count > 0)
{
// change idx to test different IPv6 addresses found
options.Listen(ipv6Addresses[0], endpoint.Port);
}
else
{
// always defaults to ipv4 address
options.Listen(IPAddress.IPv6Any, endpoint.Port);
}
}).
ConfigureServices(
services => services
.AddSingleton<StatelessServiceContext>(serviceContext))
.UseContentRoot(Directory.GetCurrentDirectory())
.UseServiceFabricIntegration(listener, ServiceFabricIntegrationOptions.None)
.UseStartup<Startup>()
.UseUrls(url)
.Build()), endpoint.Name);
listeners.Add(instanceListener);
}
return listeners;
}
Here is the endpoints shown in the Service Fabric Explorer for one of the nodes: Endpoint addresses
Regarding the socket listener I have also altered so that IPv6 is forwarded to backend port 8517 instead of 8507. Similarily as with Kestrel web server the socket listener will open two listening instances on respective addresses with appropriate port.
I hope this information is of any help.
It turns out I made a very stupid mistake that is completely my fault, I forgot to actually verify that my ISP fully supports IPv6. Turns out they don't!
Testing from a provider with full IPv6 support works as it should and I am able to get full connectivity to the nodes in the Service Fabric cluster.
Here is the working ARM template for anyone that needs a fully working example of Service Fabric cluster with IPv4 and IPv6 support:
Not allowed to post pastebin links without a accompanied code snippet...
Update:
Due to length constraints the template could not be pasted in this thread in its entirety, however over on the GitHub Issues page for Service Fabric I crossposted this. The ARM template is posted as a comment in that thread, it will hopefully be available longer than the pastebin link. View it here.