Starting KsqlRestApplication form scala and getting NoSuchMethodError org.apache.kafka.streams.StreamsConfig.getConsumerConfigs - scala

I am trying to write a program that enables me to run predefined KSQL operations on Kafka topics in Scala, but I don't want to open the KSQL Cli everytime. Therefore I want to start the KSQL "Server" from within my Scala program. If I understand the KSQL source code right, I have to build and start a KsqlRestApplication:
def restServer = KsqlRestApplication.buildApplication(new
KsqlRestConfig(defaultServerProperties), true, new VersionCheckerAgent
{override def start(ksqlModuleType: KsqlModuleType, properties:
Properties): Unit = ???})
But when I try doing that, I get the following error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.streams.StreamsConfig.getConsumerConfigs(Ljava/lang/String;Ljava/lang/String;)Ljava/util/Map;
at io.confluent.ksql.rest.server.BrokerCompatibilityCheck.create(BrokerCompatibilityCheck.java:62)
at io.confluent.ksql.rest.server.KsqlRestApplication.buildApplication(KsqlRestApplication.java:241)
I looked into the function call in BrokerCompatibilityCheck and in the create function it calls the StreamsConfig.getConsumerConfigs() with 2 Strings as parameters instead of the parameters defined in
https://kafka.apache.org/0102/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(StreamThread,%20java.lang.String,%20java.lang.String).
Are my KSQL and Kafka version simply not compatible or am I doing something wrong?
I am using KSQL version 4.1.0-SNAPSHOT and Kafka version 1.0.0.

Yes, NoSuchMethodError typically indicates a version incompatibility between libraries.
The link you posted is to javadoc for kafka 0.10.2. The method hasn't changed in 1.0 but indeed in the upcoming 1.1 it only takes 2 Strings:
https://kafka.apache.org/11/javadoc/org/apache/kafka/streams/StreamsConfig.html#getConsumerConfigs(java.lang.String,%20java.lang.String)
. That suggests the version of KSQL you're using (4.1.0-SNAPSHOT) depends on version 1.1 of kafka streams, which is currently in the release candidate phase and I believe and should be out soon:
https://lists.apache.org/thread.html/780c4458b16590e99261b69d7b41b9ec374a3226d72c8d38885a008a#%3Cusers.kafka.apache.org%3E
As per that email you can find the latest (1.1.0-rc2) artifacts in the apache staging repo:
https://repository.apache.org/content/groups/staging/

Related

Using the Beam Python SDK and PortableRunner to connect to Kafka with SSL

I have the code below for connecting to kafka using the python beam sdk. I know that the ReadFromKafka transform is run in a java sdk harness (docker container) but I have not been able to figure out how to make ssl.truststore.location and ssl.keystore.location accesible inside the sdk harness' docker environment. The job_endpoint argument is pointing to java -jar beam-runners-flink-1.10-job-server-2.27.0.jar --flink-master localhost:8081
pipeline_args.extend([
'--job_name=paul_test',
'--runner=PortableRunner',
'--sdk_location=container',
'--job_endpoint=localhost:8099',
'--streaming',
"--environment_type=DOCKER",
f"--sdk_harness_container_image_overrides=.*java.*,{my_beam_sdk_docker_image}:{my_beam_docker_tag}",
])
with beam.Pipeline(options=PipelineOptions(pipeline_args)) as pipeline:
kafka = pipeline | ReadFromKafka(
consumer_config={
"bootstrap.servers": "bootstrap-server:17032",
"security.protocol": "SSL",
"ssl.truststore.location": "/opt/keys/client.truststore.jks", # how do I make this available to the Java SDK harness
"ssl.truststore.password": "password",
"ssl.keystore.type": "PKCS12",
"ssl.keystore.location": "/opt/keys/client.keystore.p12", # how do I make this available to the Java SDK harness
"ssl.keystore.password": "password",
"group.id": "group",
"basic.auth.credentials.source": "USER_INFO",
"schema.registry.basic.auth.user.info": "user:password"
},
topics=["topic"],
max_num_records=2,
# expansion_service="localhost:56938"
)
kafka | beam.Map(lambda x: print(x))
I tried specifying the image override option as --sdk_harness_container_image_overrides='.*java.*,beam_java_sdk:latest' - where beam_java_sdk:latest is a docker image I based on apache/beam_java11_sdk:2.27.0 and that pulls the credetials in its entrypoint.sh. But Beam does not appear to use it, I see
INFO org.apache.beam.runners.fnexecution.environment.DockerEnvironmentFactory - Still waiting for startup of environment apache/beam_java11_sdk:2.27.0 for worker id 1-1
in the logs. Which is soon inevitebly followed by
Caused by: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /opt/keys/client.keystore.p12 of type PKCS12
In conclusion, my question is this, In Apache Beam, is it possible to make files available inside java sdk harness docker container from the python beam sdk? If so, how might it be done?
Many thanks.
Currently, there is no straightforward way to achieve this. There is ongoing discussion and tracking issues to provide support for this kind of expansion service customization (see here, here, BEAM-12538 and BEAM-12539). That is the short answer.
Long answer is yes, you can do that. You would have to copy &paste ExpansionService.java into your codebase and build your custom expansion service, where you specify default environment (DOCKER) and default environment config (your image) here. You then have to manually run this expansion service and specify its address using expansion_service parameter of ReadFromKafka.

Could not instantiate EventHubSourceProvider for Azure Databricks

Using the steps documented in structured streaming pyspark, I'm unable to create a dataframe in pyspark from the Azure Event Hub I have set up in order to read the stream data.
Error message is:
java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.eventhubs.EventHubsSourceProvider could not be instantiated
I have installed the Maven libraries (com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.12 is unavailable) but none appear to work:
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15
com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.6
As well as ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString) but the error message returned is:
java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
The connection string is correct as it is also used in a console application that writes to the Azure Event Hub and that works.
Can someone point me in the right direction, please. Code in use is as follows:
from pyspark.sql.functions import *
from pyspark.sql.types import *
# Event Hub Namespace Name
NAMESPACE_NAME = "*myEventHub*"
KEY_NAME = "*MyPolicyName*"
KEY_VALUE = "*MySharedAccessKey*"
# The connection string to your Event Hubs Namespace
connectionString = "Endpoint=sb://{0}.servicebus.windows.net/;SharedAccessKeyName={1};SharedAccessKey={2};EntityPath=ingestion".format(NAMESPACE_NAME, KEY_NAME, KEY_VALUE)
ehConf = {}
ehConf['eventhubs.connectionString'] = connectionString
# For 2.3.15 version and above, the configuration dictionary requires that connection string be encrypted.
# ehConf['eventhubs.connectionString'] = sc._jvm.org.apache.spark.eventhubs.EventHubsUtils.encrypt(connectionString)
df = spark \
.readStream \
.format("eventhubs") \
.options(**ehConf) \
.load()
To resolve the issue, I did the following:
Uninstall azure event hub library versions
Install com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.15 library version from Maven Central
Restart cluster
Validate by re-running code provided in the question
I received this same error when installing libraries with the version number com.microsoft.azure:azure-eventhubs-spark_2.11:2.3.* on a Spark cluster running Spark 3.0 with Scala 2.12
For anyone else finding this via google - check if you have the correct Scala library version. In my case, my cluster is Spark v3 with Scala 2.12
Changing the "2.11" in the library version from the tutorial I was using to "2.12", so it matches my cluster runtime version, fixed the issue.
I had to take this a step further. in the format method I had to add in this:
.format("org.apache.spark.sql.eventhubs.EventHubsSourceProvider") directly.
check the cluster scala version and the library version
Unisntall the older libraries and install :
com.microsoft.azure:azure-eventhubs-spark_2.12:2.3.17
in the shared workspace(right click and install library) and also in the cluster

getSchema in PostgreSQL JDBC driver throws java.lang.AbstractMethodError or java.sql.SQLFeatureNotSupportedException

I'm using Postgresql 8.4 and my application is trying to connect to the database.
I've registered the driver:
DriverManager.registerDriver(new org.postgresql.Driver());
and then trying the connection:
db = DriverManager.getConnection(database_url);
(btw, my jdbc string is something like: jdbc:postgresql://localhost:5432/myschema?user=myuser&password=mypassword)
I've tried various version of the jdbc driver and getting two type of errors:
with jdbc3:
Exception in thread "main" java.lang.AbstractMethodError: org.postgresql.jdbc3.Jdbc3Connection.getSchema()Ljava/lang/String;
with jdbc4:
java.sql.SQLFeatureNotSupportedException: Il metodo ½org.postgresql.jdbc4.Jdbc4Connection.getSchema()╗ non Þ stato ancora implementato.
that means: method org.postgresql.jdbc4.Jdbc4Connection.getSchema() not implemented yet.
I'm missing something but I don't know what..
------ SOLVED ---------
The problem were not in the connection String or the Driver version, the problem were in the code directly above the getConnection() method:
db = DriverManager.getConnection(database_url);
LOGGER.info("Connected to : " + db.getCatalog() + " - " + db.getSchema());
It seems postgresql driver doesn't have getSchema method, as the java console were often trying to say to me..
The Connection.getSchema() version was added in Java 7 / JDBC 4.1. This means that it is not necessarily available in a JDBC 3 or 4 driver (although if an implementation exists, it will get called).
If you use a JDBC 3 (Java 4/5) driver or a JDBC 4 (Java 6) driver in Java 7 or higher it is entirely possible that you receive a java.lang.AbstractMethodError when calling getSchema if it does not exist in the implementation. Java provides a form of forward compatibility for classes implementing an interface.
If new methods are added to an interface, classes that do not have these methods and were - for example - compiled against an older version of the interface, can still be loaded and used provided the new methods are not called. Missing methods will be stubbed by code that simply throws an AbstractMethodError. On the other hand: if a method getSchema had been implemented and the signature was compatible that method would now be accessible through the interface, even though the method did not exist in the interface at compile time.
In March 2011, the driver was updated so it could be compiled on Java 7 (JDBC 4.1), this happened by stubbing the new JDBC 4.1 methods with an implementation that throws a java.sql.SQLFeatureNotSupportedException, including the implementation of Connection.getSchema. This code is still in the current PostgreSQL JDBC driver version 9.3-1102. Technically a JDBC-compliant driver is not allowed to throw SQLFeatureNotSupportedException unless the API documentation or JDBC specification explicitly allows it (which it doesn't for getSchema).
However the current code on github does provide an implementation since April this year. You might want to consider compiling your own version, or ask on the pgsql-jdbc mailinglist if there are recent snapshots available (the snapshots link on http://jdbc.postgresql.org/ shows rather old versions).

Spring Data Cassandra - Environment must not be null Error

I am following basic tutorial at Spring Data Cassandra reference http://docs.spring.io/spring-data/cassandra/docs/1.1.0.RC1/reference/html/ and I am running into following exception
java.lang.IllegalArgumentException: Environment must not be null!
at org.springframework.util.Assert.notNull(Assert.java:112)
at org.springframework.data.repository.config.RepositoryConfigurationSourceSupport.<init>(RepositoryConfigurationSourceSupport.java:50)
at org.springframework.data.repository.config.AnnotationRepositoryConfigurationSource.<init>(AnnotationRepositoryConfigurationSource.java:74)
at org.springframework.data.repository.config.RepositoryBeanDefinitionRegistrarSupport.registerBeanDefinitions(RepositoryBeanDefinitionRegistrarSupport.java:74)
at org.springframework.context.annotation.ConfigurationClassParser.processImport(ConfigurationClassParser.java:394)
at org.springframework.context.annotation.ConfigurationClassParser.doProcessConfigurationClass(ConfigurationClassParser.java:204)
at org.springframework.context.annotation.ConfigurationClassParser.processConfigurationClass(ConfigurationClassParser.java:163)
at org.springframework.context.annotation.ConfigurationClassParser.parse(ConfigurationClassParser.java:138)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.processConfigBeanDefinitions(ConfigurationClassPostProcessor.java:284)
at org.springframework.context.annotation.ConfigurationClassPostProcessor.postProcessBeanDefinitionRegistry(ConfigurationClassPostProcessor.java:225)
at org.springframework.context.support.AbstractApplicationContext.invokeBeanFactoryPostProcessors(AbstractApplicationContext.java:630)
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:461)
at org.springframework.context.annotation.AnnotationConfigApplicationContext.<init>(AnnotationConfigApplicationContext.java:73)
at com.strides.platform.domain.UserRepositoryDaoTest.<init>(UserRepositoryDaoTest.java:28)
I have completed steps mentioned in document,
1) Use Cassandra Properties
2) Create Java configuration
3) Create domain and repository classes
I have autowired Environment variable in Test classes. I checked couple of sample projects and not sure what needs to be done more.
I've encountered this error message and found the problem only occuring when I used Spring Framework version 3.2.8.RELEASE.
My solution was to upgrade to version 3.2.9.RELEASE.
See also java.lang.IllegalArgumentException: Environment must not be null

Error in implementing hmsonline Cassandra-Trigger

Was trying to implement the triggers in Cassandra.
Have been trying to use the available Cassandra help: https://github.com/hmsonline/cassandra-triggers
Porting it on the latest version 1.2.3, and following the GettingStarted instruction.
when i tried to set the value for the trigger, inserting the data to fire the log, and checked the logs, following is the error which i received
java.lang.AssertionError
at org.apache.cassandra.thrift.ThriftSessionManager.currentSession(ThriftSessionManager.java:51)
at org.apache.cassandra.thrift.CassandraServer.state(CassandraServer.java:88)
at org.apache.cassandra.thrift.CassandraServer.validateLogin(CassandraServer.java:881)
at org.apache.cassandra.thrift.CassandraServer.set_keyspace(CassandraServer.java:1492)
at com.hmsonline.cassandra.triggers.dao.CassandraStore.getConnection(CassandraStore.java:42)
at com.hmsonline.cassandra.triggers.dao.ConfigurationStore.getConfiguration(ConfigurationStore.java:76)
at com.hmsonline.cassandra.triggers.dao.ConfigurationStore.isCommitLogEnabled(ConfigurationStore.java:44)
at com.hmsonline.cassandra.triggers.TriggerTask.run(TriggerTask.java:47)
at java.lang.Thread.run(Thread.java:636)
But we test it works fine at older version(ex: 1.1.2)
So is this a configuration problem, or the Thrift API implementation has been changed?
thanks