Kafka Upgrade from 5.4.1 to 6.1.2 with Confluent Playbooks - apache-kafka

Currently I'm trying to upgrade our cluster with Confluent Playbooks. I setup a local environment with Vagrant where I'm simulate our production environment.
I must be honest I'm experimenting with a quite unusually thing, because our current setup was not installed with the Confluent Playbooks. I know the documentation says I should use the Upgrade Playbooks if I used the Install Playbooks with the same hosts.yml.
Anyway I'm trying to find out if it would be possible to use the official Confluent Upgrade Playbooks. It would eventually save lot's of time for us, if I don't have to create my own Upgrade Playbooks.
The Zookeeper Upgrade was successful and after upgrading Zookeeper now I'm trying to upgrade the Brokers.
During the upgrade the Broker goes to "failed" status. If I'm trying to restart the Broker service I get the following error:
[2021-07-12 08:59:52,144] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at io.confluent.rbacapi.resources.base.AuditLogConfigResource.<init>(AuditLogConfigResource.java:78)
at io.confluent.rbacapi.resources.v1.V1AuditLogConfigResource.<init>(V1AuditLogConfigResource.java:47)
at io.confluent.rbacapi.app.RbacApiApplication.setupResources(RbacApiApplication.java:339)
at io.confluent.rest.Application.configureHandler(Application.java:258)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:227)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.http.server.KafkaHttpServerImpl.doStart(KafkaHttpServerImpl.java:105)
at java.lang.Thread.run(Thread.java:748)
[2021-07-12 08:59:52,145] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)
[2021-07-12 08:59:52,141] WARN KafkaHttpServer transitioned from STARTING to FAILED.: null. (io.confluent.http.server.KafkaHttpServerImpl)
java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at io.confluent.rbacapi.resources.base.AuditLogConfigResource.<init>(AuditLogConfigResource.java:78)
at io.confluent.rbacapi.resources.v1.V1AuditLogConfigResource.<init>(V1AuditLogConfigResource.java:47)
at io.confluent.rbacapi.app.RbacApiApplication.setupResources(RbacApiApplication.java:339)
at io.confluent.rest.Application.configureHandler(Application.java:258)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:227)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.http.server.KafkaHttpServerImpl.doStart(KafkaHttpServerImpl.java:105)
at java.lang.Thread.run(Thread.java:748)
Since I see a reference to rbac api in the error logs above, I installed confluent-security package, but it didn't helped.
My hosts.yml in my cp-ansible Ansible root directory file looks like this:
---
all:
vars:
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
ansible_connection: ssh
ansible_user: vagrant
ansible_become: true
ansible_port: 22
ansible_ssh_private_key_file: ~/.ssh/id_rsa
sasl_protocol: scram
ssl_enabled: true
ssl_provided_keystore_and_truststore: true
ssl_keystore_filepath: "/vagrant/ssl/kafka.server.keystore.jks"
ssl_keystore_key_password: pass_key
ssl_keystore_store_password: pass_key
ssl_truststore_filepath: "/vagrant/ssl/kafka.server.truststore.jks"
ssl_truststore_password: pass_trust
rbac_enabled: false
mds_super_user: mds
mds_super_user_password: password
kafka_broker_ldap_user: mds
kafka_broker_ldap_password: password
schema_registry_ldap_user: mds
schema_registry_ldap_password: password
ksql_ldap_user: mds
ksql_ldap_password: password
control_center_ldap_user: mds
control_center_ldap_password: password
create_mds_certs: false
token_services_public_pem_file: /vagrant/ssl/mds.publickey.pem
token_services_private_pem_file: /vagrant/ssl/mds.tokenkeypair.pem
kafka_broker_cluster_name: broker-cluster
schema_registry_cluster_name: schema-registry-cluster
kafka_broker_principal: User:mds
confluent_server_enabled: true
kafka_broker_schema_validation_enabled: true
kafka_broker_custom_listeners:
broker:
name: SSL
port: 9093
ssl_enabled: true
ssl_mutual_auth_enabled: true
sasl_protocol: none
zookeeper:
hosts:
bro1:
bro2:
bro3:
kafka_broker:
vars:
kafka_broker_custom_properties:
ldap.java.naming.factory.initial: com.sun.jndi.ldap.LdapCtxFactory
ldap.com.sun.jndi.ldap.read.timeout: 3000
ldap.java.naming.provider.url: ldap://192.168.0.198:10389
ldap.user.search.base: ou=TECH,ou=SPEZ-USER,o=VISA
ldap.group.search.base: ou=TECH,ou=SPEZ-USER,o=VISA
ldap.user.name.attribute: cn
ldap.user.memberof.attribute.pattern: cn=(.*),ou=TECH,ou=SPEZ-USER,o=TEST
ldap.group.name.attribute: cn
ldap.group.member.attribute.pattern: cn=(.*),ou=TECH,ou=SPEZ-USER,o=TEST
ldap.user.object.class: person
hosts:
bro1:
bro2:
bro3:
schema_registry:
hosts:
reg1:
control_center:
hosts:
cc1:
Do you have any hints where I should look for an issue?

Related

Mosquitto MQTT Broker Dynamic Security Plugin not working in ubuntu

I am trying to implement dynamic security plugin in mosquitto broker using the following documentation
https://mosquitto.org/documentation/dynamic-security/#installation
my mosquitto.conf file looks like this
listener 1883
allow_anonymous false
per_listener_settings false
persistence true
persistence_location /var/lib/mosquitto/
plugin /usr/lib/x86_64-linux-gnu/mosquitto_dynamic_security.so
plugin_opt_config_file /var/lib/mosquitto/dynamic-security.json
log_dest file /var/log/mosquitto/mosquitto.log
include_dir /etc/mosquitto/conf.d
after making configuration changes if I start the mosquitto service it is failing to start and when I check sudo journalctl -xu mosquitto i can see and error
Mosquitto Error: Unknown configuration variable "plugin"
plugin is available in this path /usr/lib/x86_64-linux-gnu/mosquitto_dynamic_security.so
and it has 777 permission, am I missing something? because the same working in windows.

"SchemaRegistryException: Failed to get Kafka cluster ID" for LOCAL setup

I'm downloaded the .tz (I am on MAC) for confluent version 7.0.0 from the official confluent site and was following the setup for LOCAL (1 node) and Kafka/ZooKeeper are starting fine, but the Schema Registry keeps failing (Note, I am behind a corporate VPN)
The exception message in the SchemaRegistry logs is:
[2021-11-04 00:34:22,492] INFO Logging initialized #1403ms to org.eclipse.jetty.util.log.Slf4jLog (org.eclipse.jetty.util.log)
[2021-11-04 00:34:22,543] INFO Initial capacity 128, increased by 64, maximum capacity 2147483647. (io.confluent.rest.ApplicationServer)
[2021-11-04 00:34:22,614] INFO Adding listener: http://0.0.0.0:8081 (io.confluent.rest.ApplicationServer)
[2021-11-04 00:35:23,007] ERROR Error starting the schema registry (io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication)
io.confluent.kafka.schemaregistry.exceptions.SchemaRegistryException: Failed to get Kafka cluster ID
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1488)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.<init>(KafkaSchemaRegistry.java:166)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.initSchemaRegistry(SchemaRegistryRestApplication.java:71)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryRestApplication.configureBaseApplication(SchemaRegistryRestApplication.java:90)
at io.confluent.rest.Application.configureHandler(Application.java:271)
at io.confluent.rest.ApplicationServer.doStart(ApplicationServer.java:245)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:73)
at io.confluent.kafka.schemaregistry.rest.SchemaRegistryMain.main(SchemaRegistryMain.java:44)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:180)
at io.confluent.kafka.schemaregistry.storage.KafkaSchemaRegistry.kafkaClusterId(KafkaSchemaRegistry.java:1486)
... 7 more
My schema-registry.properties file has bootstrap URL set to
kafkastore.bootstrap.servers=PLAINTEXT://localhost:9092
I saw some posts saying its the SchemaRegistry unable to connect to the KafkaCluster URL because of the localhost address potentially. I am fairly new to Kafka and basically just need this local setup to run a git repo that is utilizing some Topics/Kafka so my questions...
How can I fix this (I am behind a corporate VPN but I figured this shouldn't affect this)
Do I even need the SchemaRegistry?
I ended up just going with the Docker local setup inside, and the only change I had to make to the docker compose YAML was to change the schema-registry port (I changed it to 8082 or 8084, don't remember exactly but just an unused port that is not being used by some other Confluent service listed in the docker-compose.yaml) and my local setup is working fine now

Setting Up HLF network V1.4 with tls enabled and kafka based ordering

I am creating an HLF v1.4 network with TLS enabled and Kafka based ordering, But when I am trying to create a channel it throws an error saying
and when I saw the logs of orderer it is showing
Configs for TLS in network
Peer Configs
- CORE_PEER_TLS_ENABLED=true
- CORE_PEER_GOSSIP_SKIPHANDSHAKE=true
- CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/crypto/peer/tls/server.crt
- CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/crypto/peer/tls/server.key
- CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/crypto/peer/tls/ca.crt
Orderer Configs
# enabled TLS
- ORDERER_GENERAL_TLS_ENABLED=true
- ORDERER_GENERAL_TLS_PRIVATEKEY=/etc/hyperledger/crypto/orderer/tls/server.key
- ORDERER_GENERAL_TLS_CERTIFICATE=/etc/hyperledger/crypto/orderer/tls/server.crt
- ORDERER_GENERAL_TLS_ROOTCAS=[/etc/hyperledger/crypto/orderer/tls/ca.crt, /etc/hyperledger/crypto/peer/tls/ca.crt]
Cli Configs
- CORE_PEER_TLS_ENABLED=true
- CORE_PEER_TLS_CERT_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/server.crt
- CORE_PEER_TLS_KEY_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/server.key
- CORE_PEER_TLS_ROOTCERT_FILE=/etc/hyperledger/peer/peers/peer0.org1/tls/ca.crt
Can anyone help me in this regard
as the error says, bad certificate while creating a channel, orderer certificate is not found, that's why the error bad certificate.
In the compose.yaml file, set the environment variable
FABRIC_LOGGING_SPEC=DEBUG, to see exactly what the error is.

ERROR Exiting Kafka due to fatal exception (kafka.Kafka$) on Windows - Apache Kafka

I'm getting below error while starting the Kafka-Server on Windows machine. I've downloaded Scala 2.11 - kafka_2.11-2.1.0.tgz from the link: https://kafka.apache.org/downloads and I did the following steps:
Go to config folder in Apache Kafka (C:\Apache-Kafka\kafka_2.11-2.1.0\config) and edit “server.properties” using any text editor.
Find log.dirs and repelace after “=/tmp/kafka-logs” to C:\Apache-Kafka\kafka_2.11-2.1.0\kafka-logs.
Now simply start the server:
>kafka-server-start.bat C:\Apache-Kafka\kafka_2.11-2.1.0\config
Error:
C:\Apache-Kafka\kafka_2.11-2.1.0\bin\windows>kafka-server-start.bat C:\Apache-Kafka\kafka_2.11-2.1.0\config
[2018-12-14 21:09:34,566] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2018-12-14 21:09:34,583] ERROR Exiting Kafka due to fatal exception (kafka.Kafka$)
java.nio.file.AccessDeniedException: C:\Apache-Kafka\kafka_2.11-2.1.0\config
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:83)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97)
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:102)
at sun.nio.fs.WindowsFileSystemProvider.newByteChannel(WindowsFileSystemProvider.java:230)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:560)
at kafka.Kafka$.getPropsFromArgs(Kafka.scala:42)
at kafka.Kafka$.main(Kafka.scala:58)
at kafka.Kafka.main(Kafka.scala)
C:\Apache-Kafka\kafka_2.11-2.1.0\bin\windows>
Note: I've already setup Apache Zookeeper on my Windows machine and it's running on port 2181.
I run the command using run as administrator.
Try this after kafka-server-start.bat
use this: ....\config\server.properties with slash between 2 dots
in my case was
In general we must not use C: drive to store kafka-logs. You can try using the drive other than the C: for storing Kafka logs. It must work.
Change the property log.dirs={Drive other than C:}/tmp/kafka-logs present in KafkaHome/config/server.properties.

Spinnaker "Create Application" menu doesn't load

I'm quite new to the Spinnaker and have to ask for some help I guess. Does anyone knows why it could be that I can't create any Application and just keep seeing this screen.
My installation is through Halyard 1.5.0 and Ubuntu 14.04.
We don't use any cloud provider but I did configure Docker and Kubernetes part
And here is the error I see in the /var/log/spinnaker/echo/echo.log:
2017-11-16 13:52:29.901 INFO 13877 --- [ofit-/pipelines] c.n.s.echo.services.Front50Service : java.net.SocketTimeoutException: timeout
at okio.Okio$3.newTimeoutException(Okio.java:207)
at okio.AsyncTimeout.exit(AsyncTimeout.java:261)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:215)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:306)
at okio.RealBufferedSource.indexOf(RealBufferedSource.java:300)
at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:196)
at com.squareup.okhttp.internal.http.Http1xStream.readResponse(Http1xStream.java:186)
at com.squareup.okhttp.internal.http.Http1xStream.readResponseHeaders(Http1xStream.java:127)
at com.squareup.okhttp.internal.http.HttpEngine.readNetworkResponse(HttpEngine.java:739)
at com.squareup.okhttp.internal.http.HttpEngine.access$200(HttpEngine.java:87)
at com.squareup.okhttp.internal.http.HttpEngine$NetworkInterceptorChain.proceed(HttpEngine.java:724)
at com.squareup.okhttp.internal.http.HttpEngine.readResponse(HttpEngine.java:578)
at com.squareup.okhttp.Call.getResponse(Call.java:287)
at com.squareup.okhttp.Call$ApplicationInterceptorChain.proceed(Call.java:243)
at com.squareup.okhttp.Call.getResponseWithInterceptorChain(Call.java:205)
at com.squareup.okhttp.Call.execute(Call.java:80)
at retrofit.client.OkClient.execute(OkClient.java:53)
at retrofit.RestAdapter$RestHandler.invokeRequest(RestAdapter.java:326)
at retrofit.RestAdapter$RestHandler.access$100(RestAdapter.java:220)
at retrofit.RestAdapter$RestHandler$1.invoke(RestAdapter.java:265)
at retrofit.RxSupport$2.run(RxSupport.java:55)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at retrofit.Platform$Base$2$1.run(Platform.java:94)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Socket closed
at java.net.SocketInputStream.read(SocketInputStream.java:204)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at okio.Okio$2.read(Okio.java:139)
at okio.AsyncTimeout$2.read(AsyncTimeout.java:211)
... 24 more
2017-11-16 13:52:29.901 INFO 13877 --- [ofit-/pipelines] c.n.s.echo.services.Front50Service : ---- END ERROR
#grizzthedj
thanks again for recommendations. It doesn't seem, however, solved the issue. I wonder if it has something to do with my Docker Registry or Kubernetes.
Here is what I have in my .hal/config:
dockerRegistry:
enabled: true
accounts:
- name: <hidden-name>
requiredGroupMembership: []
address: https://docker-registry.<hidden-name>.net/
cacheIntervalSeconds: 30
repositories:
- hellopod
- demoapp
primaryAccount: <hidden-name>
kubernetes:
enabled: true
accounts:
- name: <username>
requiredGroupMembership: []
dockerRegistries:
- accountName: <hidden-name>
namespaces: []
context: sre-os1-dev
namespaces:
- spinnaker
omitNamespaces: []
kubeconfigFile: /home/<username>/.kube/config
I suspect you may be using redis as the persistent storage type(I ran into the same issue).
If this is the case, persistent storage using redis doesn't seem to be working properly out-of-the-box, and it is not supported. I would try using an S3 target, if available.
More info here on support for redis
To configure S3 using Halyard, use the following commands:
echo <SECRET_ACCESS_KEY> | hal config storage s3 edit --access-key-id <ACCESS_KEY_ID> --endpoint <S3_ENDPOINT> --bucket <BUCKET_NAME> --root-folder spinnaker --secret-access-key
hal config storage edit --type s3
hal deploy apply
#grizzthedj,
Here is what I've found inside front50.log (I wiped out ID's of course for security reasons)
You may be right.
2017-11-20 12:40:29.151 INFO 682 --- [0.0-8080-exec-1] com.amazonaws.latency : ServiceName=[Amazon S3], AWSErrorCode=[NoSuchKey], StatusCode=[404], ServiceEndpoint=[https://s3-us-west-2.amazonaws.com], Exception=[com.amazonaws.services.s3.model.AmazonS3Exception: The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: ...; S3 Extended Request ID: ...), S3 Extended Request ID: ...], RequestType=[GetObjectRequest], AWSRequestID=[...], HttpClientPoolPendingCount=0, RetryCapacityConsumed=0, HttpClientPoolAvailableCount=1, RequestCount=1, Exception=1, HttpClientPoolLeasedCount=0, ClientExecuteTime=[39.634], HttpClientSendRequestTime=[0.072], HttpRequestTime=[39.213], RequestSigningTime=[0.067], CredentialsRequestTime=[0.001, 0.0], HttpClientReceiveResponseTime=[39.059],
I had a similar issue on kubernetes/aws, when I opened up the chrome dev console I was getting lots of 404 errors trying to connect to localhost:8084, I had to reconfigure the deck and gate baseurls. This is what I did using halyard:
hal config security ui edit --override-base-url http://<deck-loadbalancer-dns-entry>:9000
hal config security api edit --override-base-url http://<gate-loadbalancer-dns-entry>:8084
i did hal deploy apply and when it came back I noticed the developer console was throwing cors errors so I had to do the following.
echo "host: 0.0.0.0" | tee \ ~/.hal/default/service-settings/gate.yml \ ~/.hal/default/service-settings/deck.yml
You may note the lack of TLS and cors config, this is a test system so make better choices in production :)