Why Phantom Cassandra Insert getting "main" java.lang.NoClassDefFoundError: scala/reflect/runtime/package error - scala

I very new to phantom and I have basic knowledge of scala and cassandra. so i am trying to do simple crud example using phantom so i have look at few examples at google. at last i have end up with this example. but I run this code not like as this project readme file said. I just create a intellij project and there is main class like following.
import org.creative.document.dao.impl.{PhantomConnectorExample}
import org.creative.document.database.SongsDatabase
import org.creative.document.entity.Song
object DocumentAPIMain {
def main(args: Array[String]): Unit = {
val song = Song(songId,"Wahinnta Athnam gigum di","Pawana","Nanda Malini")
val songsDatabase = new SongsDatabase(PhantomConnectorExample.connector)
songsDatabase.saveOrUpdate(song)
}
}
that was the only chage i have done other than that all the dependencies and everything as like above mention link.
I have created cassandra keyspace, songs table and songs_by_artist table manually. following is show my cassandra shema
CREATE KEYSPACE phantomtest WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;
CREATE TABLE phantomtest.songs_by_artist (
artist text,
id timeuuid,
album text,
title text,
PRIMARY KEY (artist, id)
) WITH CLUSTERING ORDER BY (id ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
CREATE TABLE phantomtest.songs (
id timeuuid PRIMARY KEY,
album text,
artist text,
title text
) WITH bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';
After all when i run my main method it will end up with the following error
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Exception in thread "main" java.lang.NoClassDefFoundError: scala/reflect/runtime/package$
at com.outworkers.phantom.column.AbstractColumn.com$outworkers$phantom$column$AbstractColumn$$_name(AbstractColumn.scala:55)
at com.outworkers.phantom.column.AbstractColumn.com$outworkers$phantom$column$AbstractColumn$$_name$(AbstractColumn.scala:54)
at com.outworkers.phantom.column.Column.com$outworkers$phantom$column$AbstractColumn$$_name$lzycompute(Column.scala:22)
at com.outworkers.phantom.column.Column.com$outworkers$phantom$column$AbstractColumn$$_name(Column.scala:22)
at com.outworkers.phantom.column.AbstractColumn.name(AbstractColumn.scala:58)
at com.outworkers.phantom.column.AbstractColumn.name$(AbstractColumn.scala:58)
at com.outworkers.phantom.column.Column.name(Column.scala:22)
at org.creative.document.database.SongsDatabase$anon$macro$1$1.store(SongsDatabase.scala:28)
at org.creative.document.database.SongsDatabase$anon$macro$1$1.store(SongsDatabase.scala:28)
at com.outworkers.phantom.CassandraTable.store(CassandraTable.scala:132)
at org.creative.document.database.SongsDatabase.saveOrUpdate(SongsDatabase.scala:28)
at org.creative.document.main.DocumentAPIMain$.main(DocumentAPIMain.scala:58)
at org.creative.document.main.DocumentAPIMain.main(DocumentAPIMain.scala)
Caused by: java.lang.ClassNotFoundException: scala.reflect.runtime.package$
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
Please help me to find out the problem. what I have did wrong...

I faced the same issue and added the following dependency.
"org.scala-lang" % "scala-reflect" % scalaVersion.value
It worked for me.

Related

Pyflink 1.14 table connectors - Kafka authentication

I've only seen Pyflink table API examples of kafka connections which does not contain authentication details in the connection establishment (doc ref), namely source table connection:
source_ddl = """
CREATE TABLE source_table(
a VARCHAR,
b INT
) WITH (
'connector' = 'kafka',
'topic' = 'source_topic',
'properties.bootstrap.servers' = 'kafka:9092',
'properties.group.id' = 'test_3',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
)
"""
I however need to connect to kafka sources with authentication enabled. By 'interpreting' that all property.XXX are dedicated as kafka config, I altered the examples as follows and tested:
import os
from pyflink.datastream.stream_execution_environment import StreamExecutionEnvironment
from pyflink.table import TableEnvironment, EnvironmentSettings, environment_settings
from pyflink.table.table_environment import StreamTableEnvironment
KAFKA_SERVERS = 'localhost:9092'
KAFKA_USERNAME = "user"
KAFKA_PASSWORD = "XXX"
KAFKA_SOURCE_TOPIC = 'source'
KAFKA_SINK_TOPIC = 'dest'
def log_processing():
env = StreamExecutionEnvironment.get_execution_environment()
env.add_jars("file:///opt/flink/lib_py/kafka-clients-2.4.1.jar")
env.add_jars("file:///opt/flink/lib_py/flink-connector-kafka_2.11-1.14.0.jar")
env.add_jars("file:///opt/flink/lib_py/flink-sql-connector-kafka_2.12-1.14.0.jar")
settings = EnvironmentSettings.new_instance()\
.in_streaming_mode()\
.use_blink_planner()\
.build()
t_env = StreamTableEnvironment.create(stream_execution_environment= env, environment_settings=settings)
source_ddl = f"""
CREATE TABLE source_table(
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight INT,
Acceleration INT,
Model_Year INT,
USA INT,
Europe INT,
Japan INT
) WITH (
'connector' = 'kafka',
'topic' = '{KAFKA_SOURCE_TOPIC}',
'properties.bootstrap.servers' = '{KAFKA_SERVERS}',
'properties.group.id' = 'testgroup12',
'properties.sasl.mechanism' = 'PLAIN',
'properties.security.protocol' = 'SASL_PLAINTEXT',
'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username=\"{KAFKA_USERNAME}\" password=\"{KAFKA_PASSWORD}\";',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
)
"""
sink_ddl = f"""
CREATE TABLE sink_table(
Cylinders INT,
Displacement INT,
Horsepower INT,
Weight INT,
Acceleration INT,
Model_Year INT,
USA INT,
Europe INT,
Japan INT
) WITH (
'connector' = 'kafka',
'topic' = '{KAFKA_SINK_TOPIC}',
'properties.bootstrap.servers' = '{KAFKA_SERVERS}',
'properties.group.id' = 'testgroup12',
'properties.sasl.mechanism' = 'PLAIN',
'properties.security.protocol' = 'SASL_PLAINTEXT',
'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username=\"{KAFKA_USERNAME}\" password=\"{KAFKA_PASSWORD}\";',
'scan.startup.mode' = 'latest-offset',
'format' = 'json'
)
"""
t_env.execute_sql(source_ddl)
t_env.execute_sql(sink_ddl)
t_env.sql_query("SELECT * FROM source_table").execute_insert("sink_table").wait()
t_env.execute("kafka-table")
if __name__ == '__main__':
log_processing()
By adding this job from the cli, there is no response or indication that a job in instantiated with a respective job id:
Respectively no job created when viewing flink UI
If I'm incorrectly configuring the connection, can someone please correct me, or point me to a relative source of documentation (I've googled quite a bit...)
Found the problem, as suggested by #DavidAnderson. The code from my question works as is... just required to update the dependency jars respectively. If using Scala 2.12 and flink version 1.14, the following dependencies are applicable (with the jar dependencies downloaded and available on you jobManager in the respective directory):
env.add_jars("file:///opt/flink/lib_py/kafka-clients-2.4.1.jar")
env.add_jars("file:///opt/flink/lib_py/flink-connector-kafka_2.12-1.14.0.jar")
env.add_jars("file:///opt/flink/lib_py/flink-sql-connector-kafka_2.12-1.14.0.jar")
A useful site to reference, which I found later on is https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kafka_2.12/1.14.0

Cannot use csvSftpConnector with schema registry

I'm trying to configure a standalone producer to read csv files from a sftp server and send data to a topic on the cloud.
So far I succeeded in reading my csv data from the file and parsing it according to my value.schema.
But now instead of using a fixed configuration schema, I'd like to use the schema registry. So I configured an AVRO schema for my test topic on the confluent cloud, generated the API key/secret and updated my config files.
I can see that connection is working fine, no authentication errors, via cli I can access the test schema, but when I try to run the producer I get the following error:
[2021-09-20 16:39:53,442] INFO SftpCsvSourceConnectorConfig values:
batch.size = 1000
behavior.on.error = IGNORE
cleanup.policy = NONE
csv.case.sensitive.field.names = false
csv.escape.char = 92
csv.file.charset = UTF-8
csv.first.row.as.header = false
csv.ignore.leading.whitespace = true
csv.ignore.quotations = false
csv.keep.carriage.return = false
csv.null.field.indicator = NEITHER
csv.quote.char = 34
csv.rfc.4180.parser.enabled = false
csv.separator.char = 44
csv.skip.lines = 0
csv.strict.quotes = false
csv.verify.reader = true
empty.poll.wait.ms = 250
error.path = /home/alberto/opt/confluent-6.2.0/sftp2/error
file.minimum.age.ms = 0
finished.path = /home/alberto/opt/confluent-6.2.0/sftp2/finished
input.file.pattern = .*.csv
input.path = /home/alberto/opt/confluent-6.2.0/sftp2/data
kafka.topic = testSchema
kerberos.keytab.path =
kerberos.user.principal =
key.schema = {"name" : "com.example.users.UserKey","type" : "STRUCT","isOptional" : true,"fieldSchemas" : {"material" : {"type" : "STRING","isOptional" : true}}}
parser.timestamp.date.formats = [yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
parser.timestamp.timezone = UTC
processing.file.extension = .PROCESSING
proxy.password = [hidden]
proxy.username =
schema.generation.enabled = false
schema.generation.key.fields = []
schema.generation.key.name = defaultkeyschemaname
schema.generation.value.name = defaultvalueschemaname
sftp.host = 192.168.1.6
sftp.password = [hidden]
sftp.port = 22
sftp.proxy.url =
sftp.username = user
timestamp.field =
timestamp.mode = PROCESS_TIME
tls.passphrase = [hidden]
tls.pemfile =
tls.private.key = [hidden]
tls.public.key = [hidden]
value.schema =
...
Caused by: org.apache.kafka.common.config.ConfigException: Both configs key.schema and value.schema must be set if schema.generation.enabled is false, but key.schema was not null and value.schema was null.
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.validateSchema(SftpSourceConnectorConfig.java:181)
at io.confluent.connect.sftp.source.SftpSourceConnectorConfig.<init>(SftpSourceConnectorConfig.java:121)
at io.confluent.connect.sftp.source.SftpCsvSourceConnectorConfig.<init>(SftpCsvSourceConnectorConfig.java:157)
at io.confluent.connect.sftp.SftpCsvSourceConnector.start(SftpCsvSourceConnector.java:44)
at org.apache.kafka.connect.runtime.WorkerConnector.doStart(WorkerConnector.java:184)
at org.apache.kafka.connect.runtime.WorkerConnector.start(WorkerConnector.java:209)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:348)
at org.apache.kafka.connect.runtime.WorkerConnector.doTransitionTo(WorkerConnector.java:331)
... 7 more
If I set schema.generation.enabled to true, it seems that it creates an empty schema:
value.schema = {"type":"STRUCT","isOptional":false,"fieldSchemas":{}}
and then I get:
org.apache.kafka.common.config.ConfigException: Failed to access Avro data from topic testSchema : Schema being registered is incompatible with an earlier schema for subject "testSchema-value"; error code: 409; error code: 409
as if it's trying to register a schema, except that it's not what I want, I just need to fetch the schema from the registry and use it.
If anyone need any addition information regarding the configuration I'll happy to provide.

Failed to deserialize Confluent Avro record using Flink-SQL

Cannot consume the Confluent Kafka data using Flink-sql.
Fink version:1.12-csadh1.3.0.0
Cluster:Cloudera(CDP)
Kafka:Confluent Kafka
SQL:
CREATE TABLE consumer_session_created (
***
) WITH (
'connector' = 'kafka',
'topic' = '***',
'scan.startup.mode' = 'earliest-offset',
'properties.bootstrap.servers' = '***:9092',
'properties.group.id' = '***',
'properties.security.protocol' = 'SASL_SSL',
'properties.sasl.mechanism' = 'PLAIN',
'properties.sasl.jaas.config' = 'org.apache.kafka.common.security.plain.PlainLoginModule required username="***" password="***"',
'properties.avro-confluent.basic-auth.credentials-source' = '***',
'properties.avro-confluent.basic-auth.user-info' = '***',
'value.format' = 'avro-confluent',
'value.fields-include' = 'EXCEPT_KEY',
'value.avro-confluent.schema-registry.url' = 'https://***',
'value.avro-confluent.schema-registry.subject' = '***'
)
Error Msg:
java.io.IOException: Failed to deserialize Avro record.
at org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:101)
at org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:44)
at org.apache.flink.api.common.serialization.DeserializationSchema.deserialize(DeserializationSchema.java:82)
at org.apache.flink.streaming.connectors.kafka.table.DynamicKafkaDeserializationSchema.deserialize(DynamicKafkaDeserializationSchema.java:113)
at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.partitionConsumerRecordsHandler(KafkaFetcher.java:179)
at org.apache.flink.streaming.connectors.kafka.internals.KafkaFetcher.runFetchLoop(KafkaFetcher.java:142)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:826)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241)
Caused by: java.io.IOException: Could not find schema with id 79 in registry
at org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.readSchema(ConfluentSchemaRegistryCoder.java:77)
at org.apache.flink.formats.avro.RegistryAvroDeserializationSchema.deserialize(RegistryAvroDeserializationSchema.java:70)
at org.apache.flink.formats.avro.AvroRowDataDeserializationSchema.deserialize(AvroRowDataDeserializationSchema.java:98)
... 9 more
Caused by: io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Unauthorized; error code: 401
at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292)
at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:660)
at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:642)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:217)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaBySubjectAndId(CachedSchemaRegistryClient.java:291)
at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaById(CachedSchemaRegistryClient.java:276)
at io.confluent.kafka.schemaregistry.client.SchemaRegistryClient.getById(SchemaRegistryClient.java:64)
at org.apache.flink.formats.avro.registry.confluent.ConfluentSchemaRegistryCoder.readSchema(ConfluentSchemaRegistryCoder.java:74)
... 11 more
I thought I used the parameters avro-confluent.basic-auth.* in the wrong way, according to the flink Doc here. So I removed the prefix properties.:
WITH (
'connector',
***
'avro-confluent.basic-auth.credentials-source' = '***',
'avro-confluent.basic-auth.user-info' = '***',
***
)
However, another exception raised:
org.apache.flink.table.api.ValidationException: Unsupported options found for connector 'kafka'.
Unsupported options:
avro-confluent.basic-auth.credentials-source
avro-confluent.basic-auth.user-info
Tips:
we can consume/de-ser the kafka data correctly using DataStream API with same parameters, and this topic has been used for a long time by others'.
I'm not sure about Flink packed into Cloudera distributive, but in simple (original) Flink I connect to Kafka using SQL:
CREATE TABLE my_flink_table (
event_date AS TO_TIMESTAMP(eventtime_string_field, 'yyyyMMddHHmmssX')
,field1
,field2
...
,WATERMARK FOR event_date AS event_date - INTERVAL '10' MINUTE
) WITH (
'connector' = 'kafka',
'topic' = 'my_events_topic',
'scan.startup.mode' = 'earliest-offset',
'format' = 'avro-confluent',
'avro-confluent.schema-registry.url' = 'http://my_confluent_schma_reg_host:8081/',
'properties.group.id' = 'flink-test-001',
'properties.bootstrap.servers' = 'kafka_host01:9092'
);

Flink SQL CLI client CREATE TABLE from Kafka

I am trying to create a table in Apache Flink SQL client. I want to filter my JSON data in Flink, which arrives continously from a Kafka cluster.
The JSON looks like this:
{"lat":25.77,"lon":-80.19,"timezone":"America\/New_York",
"timezone_offset":-14400,
"current.dt":1592151550,
"current.sunrise":1592130546,
"current.sunset":1592179999,
"current.temp":302.77,
"current.feels_like":306.9,
"current.pressure":1017,
"current.humidity":78,
"current.dew_point":298.52,
"current.uvi":11.97,
"current.clouds":75,
"current.visibility":16093,
"current.wind_speed":3.6,
"current.wind_deg":60,
"current.weather.0.id":803,
"current.weather.0.main":"Clouds",
"current.weather.0.description":"broken clouds",
"current.weather.0.icon":"04d"}
The part I am interested in :
"current.weather.0.description":"broken clouds"
I want to filter my data whenever the current.weather description is "moderate rain". I tried to create two tables in Flink:
the Rain table, where the whole JSON arrives, and
where my filtered data will be stored and sent back to another Kafka cluster.
CREATE TABLE Rain (current.weather.0.description varchar) WITH ('connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherRawData',
'format.type' = 'json',
'connector.properties.0.key' = 'bootstrap.servers',
'connector.properties.0.value' = 'kafka:9092',
'connector.properties.1.key' = 'group.id',
'connector.properties.1.value' = 'flink-input-group',
'connector.startup-mode' = 'earliest-offset'
);
CREATE TABLE ProcessedRain(
current.weather.0.description varchar
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherProcessedData',
'format.type' = 'json',
'connector.properties.0.key' = 'bootstrap.servers',
'connector.properties.0.value' = 'kafka:9092',
'connector.properties.1.key' = 'group.id',
'connector.properties.1.value' = 'flink-output-group'
);
The error message I get :
[ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered "current" at line 1, column 20. Was expecting one of:
"PRIMARY" ...
"UNIQUE" ...
"WATERMARK" ...
<BRACKET_QUOTED_IDENTIFIER> ...
<QUOTED_IDENTIFIER> ...
<BACK_QUOTED_IDENTIFIER> ...
<IDENTIFIER> ...
<UNICODE_QUOTED_IDENTIFIER> ...
How should my CREATE TABLE be created correctly?
I think it should be
CREATE TABLE ProcessedRain (
`current.weather.0.description` VARCHAR
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'WeatherProcessedData',
'format.type' = 'json',
'connector.properties.bootstrap.servers' = 'kafka:9092',
'connector.properties.group.id' = 'flink-output-group'
);

spring with quartz.properties

I am trying to store the quartz scheduled job in the database by using postgreSQL.
The database name is "testdb" and the schema is "testschema". I have the following quartz.properties. I have a following quartz.properties, but I am getting this error message "org.postgresql.util.PSQLException: ERROR: relation "quartz_triggers" does not exist".
org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount = 10
org.quartz.threadPool.threadPriority = 5
org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true
org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.dataSource = myDS
org.quartz.jobStore.tablePrefix = quartz_
org.quartz.dataSource.myDS.driver = org.postgresql.Driver
org.quartz.dataSource.myDS.URL = jdbc:postgresql://localhost:5432/postgres
org.quartz.dataSource.myDS.user = postgres
org.quartz.dataSource.myDS.password = welcome123
org.quartz.dataSource.myDS.maxConnections 10
Could anyone tell me what is wrong?
First of all, you must run the installation script that creates the quartz tables in your database.
The files are available in the full quartz jar file (/docs/dbTables)
Here is the documentation: http://quartz-scheduler.org/documentation/quartz-2.x/tutorials/tutorial-lesson-09
M.