I'm trying to run a Pulsar DebeziumPostgresSource connector.
This is the command I'm running:
bin/pulsar-admin \
--admin-url https://localhost:8443 \
--auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken \
--auth-params file:///pulsar/tokens/broker/token \
--tls-allow-insecure \
source localrun \
--broker-service-url pulsar+ssl://my-pulsar-server:6651 \
--client-auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken \
--client-auth-params file:///pulsar/tokens/broker/token \
--tls-allow-insecure \
--source-config-file /pulsar/debezium-config/my-source-config.yaml
Here's the /pulsar/debezium-config/my-source-config.yaml file:
tenant: my-tenant
namespace: my-namespace
name: my-source
topicName: my-topic
archive: connectors/pulsar-io-debezium-postgres-2.6.0-SNAPSHOT.nar
parallelism: 1
configs:
plugin.name: pgoutput
database.hostname: my-db-server
database.port: "5432"
database.user: my-db-user
database.password: my-db-password
database.dbname: my-db
database.server.name: my-db-server-name
table.whitelist: my_schema.my_table
pulsar.service.url: pulsar+ssl://my-pulsar-server:6651/
And here's the output from the command above:
11:47:29.924 [main] INFO org.apache.pulsar.functions.runtime.RuntimeSpawner - my-tenant/my-namespace/my-source-0 RuntimeSpawner starting function
11:47:29.925 [main] INFO org.apache.pulsar.functions.runtime.thread.ThreadRuntime - ThreadContainer starting function with instance config InstanceConfig(instanceId=0, functionId=4073a1d9-1312-4570-981b-6723626e394a, functionVersion=01d5a3a7-c6d7-4f79-8717-403ad1371411, functionDetails=tenant: "my-tenant"
namespace: "my-namespace"
name: "my-source"
className: "org.apache.pulsar.functions.api.utils.IdentityFunction"
autoAck: true
parallelism: 1
source {
className: "org.apache.pulsar.io.debezium.postgres.DebeziumPostgresSource"
configs: "{\"database.user\":\"my-db-user\",\"database.dbname\":\"my-db\",\"database.hostname\":\"my-db-server\",\"database.password\":\"my-db-password\",\"database.server.name\":\"my-db-server-name\",\"plugin.name\":\"pgoutput\",\"database.port\":\"5432\",\"pulsar.service.url\":\"pulsar+ssl://my-pulsar-server:6651/\",\"table.whitelist\":\"my_schema.my_table\"}"
typeClassName: "org.apache.pulsar.common.schema.KeyValue"
}
sink {
topic: "my-topic"
typeClassName: "org.apache.pulsar.common.schema.KeyValue"
}
resources {
cpu: 1.0
ram: 1073741824
disk: 10737418240
}
componentType: SOURCE
, maxBufferedTuples=1024, functionAuthenticationSpec=null, port=39135, clusterName=local, maxPendingAsyncRequests=1000)
11:47:32.552 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xf8ffbf24, L:/redacted-ip-l:43802 - R:my-pulsar-server/redacted-ip-r:6651]] Connected to server
11:47:33.240 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - Starting Pulsar producer perf with config: {
"topicName" : "my-topic",
"producerName" : null,
"sendTimeoutMs" : 0,
"blockIfQueueFull" : true,
"maxPendingMessages" : 1000,
"maxPendingMessagesAcrossPartitions" : 50000,
"messageRoutingMode" : "CustomPartition",
"hashingScheme" : "Murmur3_32Hash",
"cryptoFailureAction" : "FAIL",
"batchingMaxPublishDelayMicros" : 10000,
"batchingPartitionSwitchFrequencyByPublishDelay" : 10,
"batchingMaxMessages" : 1000,
"batchingMaxBytes" : 131072,
"batchingEnabled" : true,
"chunkingEnabled" : false,
"compressionType" : "LZ4",
"initialSequenceId" : null,
"autoUpdatePartitions" : true,
"multiSchema" : true,
"properties" : {
"application" : "pulsar-source",
"id" : "my-tenant/my-namespace/my-source",
"instance_id" : "0"
}
}
11:47:33.259 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerStatsRecorderImpl - Pulsar client config: {
"serviceUrl" : "pulsar+ssl://my-pulsar-server:6651",
"authPluginClassName" : "org.apache.pulsar.client.impl.auth.AuthenticationToken",
"authParams" : "file:///pulsar/tokens/broker/token",
"authParamMap" : null,
"operationTimeoutMs" : 30000,
"statsIntervalSeconds" : 60,
"numIoThreads" : 1,
"numListenerThreads" : 1,
"connectionsPerBroker" : 1,
"useTcpNoDelay" : true,
"useTls" : true,
"tlsTrustCertsFilePath" : null,
"tlsAllowInsecureConnection" : true,
"tlsHostnameVerificationEnable" : false,
"concurrentLookupRequest" : 5000,
"maxLookupRequest" : 50000,
"maxLookupRedirects" : 20,
"maxNumberOfRejectedRequestPerConnection" : 50,
"keepAliveIntervalSeconds" : 30,
"connectionTimeoutMs" : 10000,
"requestTimeoutMs" : 60000,
"initialBackoffIntervalNanos" : 100000000,
"maxBackoffIntervalNanos" : 60000000000,
"listenerName" : null,
"useKeyStoreTls" : false,
"sslProvider" : null,
"tlsTrustStoreType" : "JKS",
"tlsTrustStorePath" : null,
"tlsTrustStorePassword" : null,
"tlsCiphers" : [ ],
"tlsProtocols" : [ ],
"proxyServiceUrl" : null,
"proxyProtocol" : null
}
11:47:33.418 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ConnectionPool - [[id: 0xab39f703, L:/redacted-ip-l:43806 - R:my-pulsar-server/redacted-ip-r:6651]] Connected to server
11:47:33.422 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ClientCnx - [id: 0xab39f703, L:/redacted-ip-l:43806 - R:my-pulsar-server/redacted-ip-r:6651] Connected through proxy to target broker at my-broker:6651
11:47:33.484 [pulsar-client-io-1-1] INFO org.apache.pulsar.client.impl.ProducerImpl - [my-topic] [null] Creating producer on cnx [id: 0xab39f703, L:/redacted-ip-l:43806 - R:my-pulsar-server/redacted-ip-r:6651]
11:48:33.434 [pulsar-client-io-1-1] ERROR org.apache.pulsar.client.impl.ProducerImpl - [my-topic] [null] Failed to create producer: 3 lookup request timedout after ms 30000
11:48:33.438 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0xab39f703, L:/redacted-ip-l:43806 - R:my-pulsar-server/redacted-ip-r:6651] request 3 timed out after 30000 ms
11:48:33.629 [main] INFO org.apache.pulsar.functions.LocalRunner - RuntimeSpawner quit because of
java.lang.RuntimeException: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 3 lookup request timedout after ms 30000
at org.apache.pulsar.functions.sink.PulsarSink$PulsarSinkAtMostOnceProcessor.<init>(PulsarSink.java:177) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.sink.PulsarSink$PulsarSinkAtLeastOnceProcessor.<init>(PulsarSink.java:206) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.sink.PulsarSink.open(PulsarSink.java:284) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setupOutput(JavaInstanceRunnable.java:819) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.setup(JavaInstanceRunnable.java:224) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.instance.JavaInstanceRunnable.run(JavaInstanceRunnable.java:246) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_252]
Caused by: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 3 lookup request timedout after ms 30000
at org.apache.pulsar.client.api.PulsarClientException.unwrap(PulsarClientException.java:821) ~[org.apache.pulsar-pulsar-client-api-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.client.impl.ProducerBuilderImpl.create(ProducerBuilderImpl.java:93) ~[org.apache.pulsar-pulsar-client-original-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.sink.PulsarSink$PulsarSinkProcessorBase.createProducer(PulsarSink.java:106) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
at org.apache.pulsar.functions.sink.PulsarSink$PulsarSinkAtMostOnceProcessor.<init>(PulsarSink.java:174) ~[org.apache.pulsar-pulsar-functions-instance-2.6.0-SNAPSHOT.jar:2.6.0-SNAPSHOT]
... 6 more
11:48:59.956 [function-timer-thread-5-1] ERROR org.apache.pulsar.functions.runtime.RuntimeSpawner - my-tenant/my-namespace/my-source-java.lang.RuntimeException: org.apache.pulsar.client.api.PulsarClientException$TimeoutException: 3 lookup request timedout after ms 30000 Function Container is dead with exception.. restarting
As you can see, it failed to create a producer due to a TimeoutException. What are the likely causes of this error? What's the best way to further investigate this issue?
Additional info:
I have also tried the --tls-trust-cert-path /my/ca-certificates.crt option instead of --tls-allow-insecure, but got the same error.
I am able to list tenants:
bin/pulsar-admin \
--admin-url https://localhost:8443 \
--auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken \
--auth-params file:///pulsar/tokens/broker/token \
tenants list
# Output:
# "public"
# "pulsar"
# "my-topic"
But I am not able to get an OK broker health-check:
bin/pulsar-admin \
--admin-url https://localhost:8443 \
--auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken \
--auth-params file:///pulsar/tokens/broker/token \
brokers healthcheck
# Output:
# null
# Reason: java.util.concurrent.TimeoutException
bin/pulsar-admin \
--admin-url https://localhost:8443 \
--auth-plugin org.apache.pulsar.client.impl.auth.AuthenticationToken \
--auth-params file:///pulsar/tokens/broker/token \
--tls-allow-insecure \
brokers healthcheck
# Output:
# HTTP 500 Internal Server Error
# Reason: HTTP 500 Internal Server Error
In my case, the root cause was an expired TLS certificate.
Related
I used scaffolding to generate a new microservice,then I made the following configuration for mongodb:
logging:
level:
ROOT: DEBUG
io.github.jhipster: DEBUG
com.fzai.fileservice: DEBUG
eureka:
instance:
prefer-ip-address: true
client:
service-url:
defaultZone: http://admin:${jhipster.registry.password}#localhost:8761/eureka/
spring:
profiles:
active: dev
include:
- swagger
# Uncomment to activate TLS for the dev profile
#- tls
devtools:
restart:
enabled: true
additional-exclude: static/**
livereload:
enabled: false # we use Webpack dev server + BrowserSync for livereload
jackson:
serialization:
indent-output: true
data:
mongodb:
host: 42.193.124.204
port: 27017
username: admin
password: admin123
authentication-database: fileService
database: fileService
mail:
host: localhost
port: 25
username:
password:
messages:
cache-duration: PT1S # 1 second, see the ISO 8601 standard
thymeleaf:
cache: false
sleuth:
sampler:
probability: 1 # report 100% of traces
zipkin: # Use the "zipkin" Maven profile to have the Spring Cloud Zipkin dependencies
base-url: http://localhost:9411
enabled: false
locator:
discovery:
enabled: true
server:
port: 8081
# ===================================================================
# JHipster specific properties
#
# Full reference is available at: https://www.jhipster.tech/common-application-properties/
# ===================================================================
jhipster:
cache: # Cache configuration
hazelcast: # Hazelcast distributed cache
time-to-live-seconds: 3600
backup-count: 1
management-center: # Full reference is available at: http://docs.hazelcast.org/docs/management-center/3.9/manual/html/Deploying_and_Starting.html
enabled: false
update-interval: 3
url: http://localhost:8180/mancenter
# CORS is disabled by default on microservices, as you should access them through a gateway.
# If you want to enable it, please uncomment the configuration below.
cors:
allowed-origins: "*"
allowed-methods: "*"
allowed-headers: "*"
exposed-headers: "Authorization,Link,X-Total-Count"
allow-credentials: true
max-age: 1800
security:
client-authorization:
access-token-uri: http://uaa/oauth/token
token-service-id: uaa
client-id: internal
client-secret: internal
mail: # specific JHipster mail property, for standard properties see MailProperties
base-url: http://127.0.0.1:8081
metrics:
logs: # Reports metrics in the logs
enabled: false
report-frequency: 60 # in seconds
logging:
use-json-format: false # By default, logs are not in Json format
logstash: # Forward logs to logstash over a socket, used by LoggingConfiguration
enabled: false
host: localhost
port: 5000
queue-size: 512
audit-events:
retention-period: 30 # Number of days before audit events are deleted.
oauth2:
signature-verification:
public-key-endpoint-uri: http://uaa/oauth/token_key
#ttl for public keys to verify JWT tokens (in ms)
ttl: 3600000
#max. rate at which public keys will be fetched (in ms)
public-key-refresh-rate-limit: 10000
web-client-configuration:
#keep in sync with UAA configuration
client-id: web_app
secret: changeit
An error occurred while I was running the project:
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'mongobee' defined in class path resource [com/fzai/fileservice/config/DatabaseConfiguration.class]: Invocation of init method failed; nested exception is com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on fileService to execute command { find: "system.indexes", filter: { ns: "fileService.dbchangelog", key: { changeId: 1, author: 1 } }, limit: 1, singleBatch: true, $db: "fileService" }' on server 42.193.124.204:27017
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1771)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:593)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:515)
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:320)
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:318)
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:199)
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:847)
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:877)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:549)
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141)
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:744)
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:391)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:312)
at com.fzai.fileservice.FileServiceApp.main(FileServiceApp.java:70)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: com.mongodb.MongoQueryException: Query failed with error code 13 and error message 'not authorized on fileService to execute command { find: "system.indexes", filter: { ns: "fileService.dbchangelog", key: { changeId: 1, author: 1 } }, limit: 1, singleBatch: true, $db: "fileService" }' on server 42.193.124.204:27017
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:706)
at com.mongodb.operation.FindOperation$1.call(FindOperation.java:695)
at com.mongodb.operation.OperationHelper.withConnectionSource(OperationHelper.java:462)
at com.mongodb.operation.OperationHelper.withConnection(OperationHelper.java:406)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:695)
at com.mongodb.operation.FindOperation.execute(FindOperation.java:83)
at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:179)
at com.mongodb.client.internal.FindIterableImpl.first(FindIterableImpl.java:198)
at com.github.mongobee.dao.ChangeEntryIndexDao.findRequiredChangeAndAuthorIndex(ChangeEntryIndexDao.java:35)
at com.github.mongobee.dao.ChangeEntryDao.ensureChangeLogCollectionIndex(ChangeEntryDao.java:121)
at com.github.mongobee.dao.ChangeEntryDao.connectMongoDb(ChangeEntryDao.java:61)
at com.github.mongobee.Mongobee.execute(Mongobee.java:143)
at com.github.mongobee.Mongobee.afterPropertiesSet(Mongobee.java:126)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1830)
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1767)
... 19 common frames omitted
But in my other simple springboot project, I used the same configuration, which can run and use successfully:
spring:
application:
name: springboot1
data:
mongodb:
host: 42.193.124.204
port: 27017
username: admin
password: admin123
authentication-database: fileService
database: fileService
This is the user and role I created:
{
"_id" : "fileService.admin",
"userId" : UUID("03f75395-f129-4273-b6a6-b2dc3d1f7974"),
"user" : "admin",
"db" : "fileService",
"roles" : [
{
"role" : "dbOwner",
"db" : "fileService"
},
{
"role" : "readWrite",
"db" : "fileService"
}
],
"mechanisms" : [
"SCRAM-SHA-1",
"SCRAM-SHA-256"
]
}
I want to know what's wrong.
I am trying to test out Brooklin for mirroring data between kafka clusters. I am following the wiki https://github.com/linkedin/brooklin/wiki/mirroring-kafka-clusters
Unlike the wiki - I am trying to setup the mirroring between 2 different clusters. I am able to start the Brooklin process and the Datastream but I cannot manage to mirror messages. Brooklin is running on the source kafka cluster ATM. I am trying to mirror topic 'test'
The server.properties for brooklin is
############################# Server Basics #############################
brooklin.server.coordinator.cluster=brooklin-cluster
brooklin.server.coordinator.zkAddress=localhost:2181
brooklin.server.httpPort=32311
brooklin.server.connectorNames=file,test,kafkaMirroringConnector
brooklin.server.transportProviderNames=kafkaTransportProvider
brooklin.server.csvMetricsDir=/tmp/brooklin-example/
########################### Transport provider configs ######################
brooklin.server.transportProvider.kafkaTransportProvider.factoryClassName=com.linkedin.datastream.kafka.KafkaTransportProviderAdminFactory
brooklin.server.transportProvider.kafkaTransportProvider.bootstrap.servers=kafka-dest:9092
brooklin.server.transportProvider.kafkaTransportProvider.zookeeper.connect=kafka-dest:2181
brooklin.server.transportProvider.kafkaTransportProvider.client.id=datastream-producer
########################### File connector Configs ######################
brooklin.server.connector.file.factoryClassName=com.linkedin.datastream.connectors.file.FileConnectorFactory
brooklin.server.connector.file.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
brooklin.server.connector.file.strategy.maxTasks=1
########################### Test event producing connector Configs ######################
brooklin.server.connector.test.factoryClassName=com.linkedin.datastream.connectors.TestEventProducingConnectorFactory
brooklin.server.connector.test.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.LoadbalancingStrategyFactory
brooklin.server.connector.test.strategy.TasksPerDatastream = 4
########################### Kafka Mirroring connector Configs ######################
brooklin.server.connector.kafkaMirroringConnector.factoryClassName=com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorFactory
brooklin.server.connector.kafkaMirroringConnector.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
I then try to start the following Datastream;
bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-mirroring-stream -s "kafka://localhost:9092/test" -c kafkaMirroringConnector -t kafkaTransportProvider -m '{"owner":"root","system.reuseExistingDestination":"false"}' 2>/dev/null
Trying to check the Datastream;
bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/ 2>/dev/null
[2020-10-14 05:55:45,087] INFO Creating RestClient for http://localhost:32311/ with {}, count=1 (com.linkedin.datastream.DatastreamRestClientFactory)
[2020-10-14 05:55:45,113] INFO The service 'null' has been assigned to the ChannelPoolManager with key 'noSpecifiedNamePrefix 1138266797 ' (com.linkedin.r2.transport.http.client.HttpClientFactory)
[2020-10-14 05:55:45,215] INFO DatastreamRestClient created with retryPeriodMs=6000 retryTimeoutMs=90000 (com.linkedin.datastream.DatastreamRestClient)
[2020-10-14 05:55:45,502] INFO getAllDatastreams took 272 ms (com.linkedin.datastream.DatastreamRestClient)
{
"name" : "first-mirroring-stream",
"connectorName" : "kafkaMirroringConnector",
"transportProviderName" : "kafkaTransportProvider",
"source" : {
"connectionString" : "kafka://localhost:9092/test"
},
"Status" : "READY",
"destination" : {
"connectionString" : "kafka://kafka-dest:9092/*"
},
"metadata" : {
"datastreamUUID" : "df081002-fc7b-4f3a-b1ce-016e879d4b29",
"group.id" : "first-mirroring-stream",
"owner" : "root",
"system.IsConnectorManagedDestination" : "true",
"system.creation.ms" : "1602665999603",
"system.destination.KafkaBrokers" : "kafka-dest:9092",
"system.reuseExistingDestination" : "false",
"system.taskPrefix" : "first-mirroring-stream"
}
}
After this is running I try to produce on the source and consume on the destination but I do not get any mirroring.
Does anyone have a clue what I'm missing/what I did wrong?
Thanks!
This was an issue on my end - I had a typo in the topic name configured for mirroring.
I am trying to run Cassandra sink connector for confluent platform.The cassandra-sink.json file is as below :
{
"name" : "cassandra-sink",
"config" : {
"connector.class" : "io.confluent.connect.cassandra.CassandraSinkConnector",
"tasks.max" : "1",
"topics" : "topic1",
"cassandra.contact.points" : "127.0.0.1",
"cassandra.keyspace" : "test",
"confluent.topic.bootstrap.servers": "127.0.0.1:9092",
"cassandra.write.mode" : "Update",
"connect.cassandra.port":"127.0.0.1:9042"
}
}
I downloaded confluent-hub install confluentinc/kafka-connect-cassandra:latest as per the link.
I am able to load the file but when i check the status i get the below error. I am unable to figure out what the issue is.
FAILED worker_id:127.0.0.1:8083,trace:com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect
com.datastax.driver.core.ControlConnection.reconnectInternal
com.datastax.driver.core.ControlConnection.connect
com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect
com.datastax.driver.core.Cluster$Manager.init
com.datastax.driver.core.Cluster.init
com.datastax.driver.core.SessionManager.initAsync
com.datastax.driver.core.SessionManager.executeAsync
com.datastax.driver.core.AbstractSession.execute
io.confluent.connect.cassandra.CassandraSessionImpl.executeStatement
io.confluent.connect.cassandra.CassandraSinkConnector.doStart
io.confluent.connect.cassandra.CassandraSinkConnector.start
org.apache.kafka.connect.runtime.WorkerConnector.doStart
org.apache.kafka.connect.runtime.WorkerConnector.start
org.apache.kafka.connect.runtime.WorkerConnector.transitionTo
org.apache.kafka.connect.runtime.Worker.startConnector
org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector
org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300
org.apache.kafka.connect.runtime.distributed.DistributedHerder$14
org.apache.kafka.connect.runtime.distributed.DistributedHerder$14
java.util.concurrent.FutureTask.run java.util.concurrent.ThreadPoolExecutor.runWorker
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run
Please guide.
I have an application (spring-boot-shipping-service) with a KStream that gets OrderCreatedEvent messages generated by an external producer (spring-boot-order-service). This producer uses the following schema:
order-created-event.avsc
{
"namespace" : "com.codependent.statetransfer.order",
"type" : "record",
"name" : "OrderCreatedEvent",
"fields" : [
{"name":"id","type":"int"},
{"name":"productId","type":"int"},
{"name":"customerId","type":"int"}
]
}
My KStream<Int, OrderCreatedEvent> is joined with a KTable<Int, Customer> and publishes to the order topic a new kind of message: OrderShippedEvent.
order-shipped-event.avsc
{
"namespace" : "com.codependent.statetransfer.order",
"type" : "record",
"name" : "OrderShippedEvent",
"fields" : [
{"name":"id","type":"int"},
{"name":"productId","type":"int"},
{"name":"customerName","type":"string"},
{"name":"customerAddress","type":"string"}
]
}
For some reason the new OrderShippedEvent messages aren't generated with a header application/vnd.ordershippedevent.v1+avro but application/vnd.ordercreatedevent.v1+avro.
This is the original OrderCreatedEvent in the order topic:
Key (4 bytes): +
Value (4 bytes): V?
Timestamp: 1555943926163
Partition: 0
Offset: 34
Headers: contentType="application/vnd.ordercreatedevent.v1+avro",spring_json_header_types={"contentType":"java.lang.String"}
And the produced OrderShippedEvent with the incorrect schema:
Key (4 bytes): +
Value (26 bytes): V?
JamesHill Street
Timestamp: 1555943926163
Partition: 0
Offset: 35
Headers: contentType="application/vnd.ordercreatedevent.v1+avro",spring_json_header_types={"contentType":"java.lang.String"}
I've checked the Confluent Schema Registry contents, and the order-shipped-event.avsc schema is there:
Why isn't it using the correct shema in the generated message?
Below you can see the full configuration and code of the example, which is also available on Github (https://github.com/codependent/event-carried-state-transfer/tree/avro)
In order to test it just start a Confluent Platform (v5.2.1), spring-boot-customer-service, spring-boot-order-service, spring-boot-shipping-service and execute the following curl commands:
curl -X POST http://localhost:8080/customers -d '{"id":1,"name":"James","address":"Hill Street"}' -H "content-type: application/json"
curl -X POST http://localhost:8084/orders -H "content-type: application/json" -d '{"id":1,"productId":1001,"/customerId":1}'
application.yml
server:
port: 8085
spring:
application:
name: spring-boot-shipping-service
cloud:
stream:
kafka:
streams:
binder:
configuration:
default:
key:
serde: org.apache.kafka.common.serialization.Serdes$IntegerSerde
bindings:
input:
destination: customer
contentType: application/*+avro
order:
destination: order
contentType: application/*+avro
output:
destination: order
contentType: application/*+avro
schema-registry-client:
endpoint: http://localhost:8081
ShippingKStreamProcessor
interface ShippingKStreamProcessor {
#Input("input")
fun input(): KStream<Int, Customer>
#Input("order")
fun order(): KStream<String, OrderCreatedEvent>
#Output("output")
fun output(): KStream<String, OrderShippedEvent>
ShippingKStreamConfiguration
#StreamListener
#SendTo("output")
fun process(#Input("input") input: KStream<Int, Customer>, #Input("order") orderEvent: KStream<Int, OrderCreatedEvent>): KStream<Int, OrderShippedEvent> {
val serdeConfig = mapOf(
AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG to "http://localhost:8081")
val intSerde = Serdes.IntegerSerde()
val customerSerde = SpecificAvroSerde<Customer>()
customerSerde.configure(serdeConfig, true)
val orderCreatedSerde = SpecificAvroSerde<OrderCreatedEvent>()
orderCreatedSerde.configure(serdeConfig, true)
val orderShippedSerde = SpecificAvroSerde<OrderShippedEvent>()
orderShippedSerde.configure(serdeConfig, true)
val stateStore: Materialized<Int, Customer, KeyValueStore<Bytes, ByteArray>> =
Materialized.`as`<Int, Customer, KeyValueStore<Bytes, ByteArray>>("customer-store")
.withKeySerde(intSerde)
.withValueSerde(customerSerde)
val customerTable: KTable<Int, Customer> = input.groupByKey(Serialized.with(intSerde, customerSerde))
.reduce({ _, y -> y }, stateStore)
return (orderEvent.filter { _, value -> value is OrderCreatedEvent && value.id != 0 }
.selectKey { _, value -> value.customerId } as KStream<Int, OrderCreatedEvent>)
.join(customerTable, { orderIt, customer ->
OrderShippedEvent(orderIt.id, orderIt.productId, customer.name, customer.address)
}, Joined.with(intSerde, orderCreatedSerde, customerSerde))
.selectKey { _, value -> value.id }
}
UPDATE: I've set trace logging level for org.springframework.messaging and apparently it looks ok:
2019-04-22 23:40:39.953 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : HTTP GET http://localhost:8081/subjects/ordercreatedevent/versions/1
2019-04-22 23:40:39.971 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Accept=[application/json, application/*+json]
2019-04-22 23:40:39.972 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Writing [] as "application/vnd.schemaregistry.v1+json"
2019-04-22 23:40:39.984 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Response 200 OK
2019-04-22 23:40:39.985 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Reading to [java.util.Map<?, ?>]
2019-04-22 23:40:40.186 INFO 46039 --- [read-1-producer] org.apache.kafka.clients.Metadata : Cluster ID: 5Sw6sBD0TFOaximF3Or-dQ
2019-04-22 23:40:40.318 DEBUG 46039 --- [-StreamThread-1] AvroSchemaRegistryClientMessageConverter : Obtaining schema for class class com.codependent.statetransfer.order.OrderShippedEvent
2019-04-22 23:40:40.318 DEBUG 46039 --- [-StreamThread-1] AvroSchemaRegistryClientMessageConverter : Avro type detected, using schema from object
2019-04-22 23:40:40.342 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : HTTP POST http://localhost:8081/subjects/ordershippedevent/versions
2019-04-22 23:40:40.342 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Accept=[application/json, application/*+json]
2019-04-22 23:40:40.342 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Writing [{"schema":"{\"type\":\"record\",\"name\":\"OrderShippedEvent\",\"namespace\":\"com.codependent.statetransfer.order\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"productId\",\"type\":\"int\"},{\"name\":\"customerName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"customerAddress\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}]}"}] as "application/json"
2019-04-22 23:40:40.348 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Response 200 OK
2019-04-22 23:40:40.348 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Reading to [java.util.Map<?, ?>]
2019-04-22 23:40:40.349 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : HTTP POST http://localhost:8081/subjects/ordershippedevent
2019-04-22 23:40:40.349 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Accept=[application/json, application/*+json]
2019-04-22 23:40:40.349 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Writing [{"schema":"{\"type\":\"record\",\"name\":\"OrderShippedEvent\",\"namespace\":\"com.codependent.statetransfer.order\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"productId\",\"type\":\"int\"},{\"name\":\"customerName\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}},{\"name\":\"customerAddress\",\"type\":{\"type\":\"string\",\"avro.java.string\":\"String\"}}]}"}] as "application/json"
2019-04-22 23:40:40.361 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Response 200 OK
2019-04-22 23:40:40.362 DEBUG 46039 --- [-StreamThread-1] o.s.web.client.RestTemplate : Reading to [java.util.Map<?, ?>]
2019-04-22 23:40:40.362 DEBUG 46039 --- [-StreamThread-1] AvroSchemaRegistryClientMessageConverter : Finding correct DatumWriter for type com.codependent.statetransfer.order.OrderShippedEvent
How come the message is written with an incorrect content type header then?
UPDATE 2:
I've kept digging into the source code and found this:
KafkaStreamsMessageConversionDelegate correctly converts and determines the right header values, as seen in the logs above.
However in the serializeOnOutbound method we can find that it returns to the Kafka API only the payload, so the headers aren't taken into account:
return
messageConverter.toMessage(message.getPayload(),
messageHeaders).getPayload();
Moving forward in the record processing org.apache.kafka.streams.processor.internals.SinkNode.process() accesses the headers present in the context, which incorrectly contain application/vnd.ordercreatedevent.v1+avro instead of application/vnd.ordershippedevent.v1+avro (?):
collector.send(topic, key, value, context.headers(), timestamp, keySerializer, valSerializer, partitioner);
UPDATE 3:
Steps to reproduce:
Download and start Confluent 5.2.1
confluent start
Start the applications spring-boot-order-service, spring-boot-customer-service, spring-boot-shipping-service
Create a customer curl -X POST http://localhost:8080/customers -d '{"id":1,"name":"John","address":"Some Street"}' -H "content-type: application/json"
Create an order that will be joined with the customer: curl -X POST http://localhost:8084/orders -H "content-type: application/json" -d '{"id":1,"productId":1,"customerId":1}'
ShippingKStreamConfiguration's process() will create a KTable for the Customer and a state store (customer-store). Besides, it will join the order stream with the customer KTable to transform an OrderCreatedEvent into an OrderShippedEvent.
You can check that the newly created OrderShippedEvent message added to the order topic has an incorrect header. This can be seen either in the Confluent Control Center (localhost:9092 -> topics -> order) or running kafkacat:
$> kafkacat -b localhost:9092 -t order -C \
-f '\nKey (%K bytes): %k
Value (%S bytes): %s
Timestamp: %T
Partition: %p
Offset: %o
Headers: %h\n'
#codependent It is indeed an issue that we need to address in the binder which we will fix soon. In the meantime, as a workaround can you make your processor not return a KStream, but rather do the sending in the method itself. You can call to(TopicNameExtractor) on the currently returned KStream. TopicNameExtractor will give you access to the record context using which you can manually set the content type.
Spark (v2.4) Program function:
Read JSON data from Kafka queue in structured streaming mode in spark
Print the read data on the console as it is
Issues getting:
- Getting Resetting offset for partition nifi-log-batch-0 to offset 2826180.
Source code:
package io.xyz.streaming
import org.apache.spark.sql.avro._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.functions._
object readKafkaJson {
private val topic = "nifi-log-batch"
private val kafkaUrl = "http://<hostname>:9092"
private val chk = "/home/xyz/tmp/checkpoint"
private val outputFileLocation = "/home/xyz/abc/data"
private val sparkSchema = StructType(Array(
StructField("timestamp", StringType),
StructField("level", StringType),
StructField("thread", StringType),
StructField("class", StringType),
StructField("message", StringType),
StructField("updatedOn", StringType),
StructField("stackTrace", StringType)))
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName("ConfluentConsumer")
.master("local[*]")
.getOrCreate()
import spark.implicits._
// ===================Read Kafka data in JSON==================
val df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", kafkaUrl)
.option("startingOffsets", "latest")
.option("subscribe", topic)
.load()
val dfs1 = df
.selectExpr("CAST(value AS STRING)")
.select(from_json(col("value"), sparkSchema).alias("my_column"))
.select("my_column.*")
// ===================Write to console==================
dfs1
.writeStream
.format("console")
.start()
.awaitTermination()
}
}
Detailed issue log on console:
2019-04-10 01:12:58 INFO WriteToDataSourceV2Exec:54 - Start processing data source writer: org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter#622d0057. The input RDD has 0 partitions.
2019-04-10 01:12:58 INFO SparkContext:54 - Starting job: start at readKafkaJson.scala:70
2019-04-10 01:12:58 INFO DAGScheduler:54 - Job 0 finished: start at readKafkaJson.scala:70, took 0.003870 s
2019-04-10 01:12:58 INFO WriteToDataSourceV2Exec:54 - Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter#622d0057 is committing.
-------------------------------------------
Batch: 0
-------------------------------------------
2019-04-10 01:12:58 INFO CodeGenerator:54 - Code generated in 41.952695 ms
+---------+-----+------+-----+-------+---------+----------+
|timestamp|level|thread|class|message|updatedOn|stackTrace|
+---------+-----+------+-----+-------+---------+----------+
+---------+-----+------+-----+-------+---------+----------+
2019-04-10 01:12:58 INFO WriteToDataSourceV2Exec:54 - Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter#622d0057 committed.
2019-04-10 01:12:58 INFO SparkContext:54 - Starting job: start at readKafkaJson.scala:70
2019-04-10 01:12:58 INFO DAGScheduler:54 - Job 1 finished: start at readKafkaJson.scala:70, took 0.000104 s
2019-04-10 01:12:58 INFO CheckpointFileManager:54 - Writing atomically to file:/tmp/temporary-df2fea18-7b2f-4146-bcfd-7923cfab65e7/commits/0 using temp file file:/tmp/temporary-df2fea18-7b2f-4146-bcfd-7923cfab65e7/commits/.0.eb290a31-1965-40e7-9028-d18f2eea0627.tmp
2019-04-10 01:12:58 INFO CheckpointFileManager:54 - Renamed temp file file:/tmp/temporary-df2fea18-7b2f-4146-bcfd-7923cfab65e7/commits/.0.eb290a31-1965-40e7-9028-d18f2eea0627.tmp to file:/tmp/temporary-df2fea18-7b2f-4146-bcfd-7923cfab65e7/commits/0
2019-04-10 01:12:58 INFO MicroBatchExecution:54 - Streaming query made progress: {
"id" : "fb44fbef-5d05-4bb8-ae72-3327b98af261",
"runId" : "ececfe49-bbc6-4964-8798-78980cbec525",
"name" : null,
"timestamp" : "2019-04-10T06:12:56.414Z",
"batchId" : 0,
"numInputRows" : 0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"addBatch" : 1324,
"getBatch" : 10,
"getEndOffset" : 1,
"queryPlanning" : 386,
"setOffsetRange" : 609,
"triggerExecution" : 2464,
"walCommit" : 55
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "KafkaV2[Subscribe[nifi-log-batch]]",
"startOffset" : null,
"endOffset" : {
"nifi-log-batch" : {
"0" : 2826180
}
},
"numInputRows" : 0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider#6ced6212"
}
}
2019-04-10 01:12:58 INFO Fetcher:583 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-9a027b2b-0a3a-4773-a356-a585e488062c--81433247-driver-0] Resetting offset for partition nifi-log-batch-0 to offset 2826180.
2019-04-10 01:12:58 INFO MicroBatchExecution:54 - Streaming query made progress: {
"id" : "fb44fbef-5d05-4bb8-ae72-3327b98af261",
"runId" : "ececfe49-bbc6-4964-8798-78980cbec525",
"name" : null,
"timestamp" : "2019-04-10T06:12:58.935Z",
"batchId" : 1,
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0,
"durationMs" : {
"getEndOffset" : 1,
"setOffsetRange" : 11,
"triggerExecution" : 15
},
"stateOperators" : [ ],
"sources" : [ {
"description" : "KafkaV2[Subscribe[nifi-log-batch]]",
"startOffset" : {
"nifi-log-batch" : {
"0" : 2826180
}
},
"endOffset" : {
"nifi-log-batch" : {
"0" : 2826180
}
},
"numInputRows" : 0,
"inputRowsPerSecond" : 0.0,
"processedRowsPerSecond" : 0.0
} ],
"sink" : {
"description" : "org.apache.spark.sql.execution.streaming.ConsoleSinkProvider#6ced6212"
}
}
2019-04-10 01:12:58 INFO Fetcher:583 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-9a027b2b-0a3a-4773-a356-a585e488062c--81433247-driver-0] Resetting offset for partition nifi-log-batch-0 to offset 2826180.
2019-04-10 01:12:58 INFO Fetcher:583 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-9a027b2b-0a3a-4773-a356-a585e488062c--81433247-driver-0] Resetting offset for partition nifi-log-batch-0 to offset 2826180.
2019-04-10 01:12:58 INFO Fetcher:583 - [Consumer clientId=consumer-1, groupId=spark-kafka-source-9a027b2b-0a3a-4773-a356-a585e488062c--81433247-driver-0] Resetting offset for partition nifi-log-batch-0 to offset 2826180.
Even when I run an equivalent code in pySpark also, I face the same issue.
Please suggest how to resolve this issue.
Kafka: v2.1.0 cpl, confluent
Spark: 2.4
Job submitted through following command:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0 --jars /home/xyz/Softwares/spark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar --class io.xyz.streaming.readKafkaJson --master local[*] /home/xyz/ScalaCode/target/SparkSchemaKafka-0.0.1-SNAPSHOT-jar-with-dependencies.jar
It seems the asker already found the solution, here are the relevant parts from the comments:
Main resolution
It was an issue of a schema structure in Scala. After correcting the
schema the issue resolved.
Secondary topic
in Pyspark code the processing is happening but the messages are not
stopping i.e. I am able to run the code and able to write the stream
data into a JSON file, but the console messages are filled with the
above mentioned Resetting offset for ... log messages
That pyspark issue was actually, INFO messages were getting printed,
which I disabled
After which all was good.