voltdb not fetching data from kafka topic - apache-kafka

I am using Voltdb. And my use case is to import data from kafka to voltdb.
I am using below command :
Command:
kafkaloader test --brokers <>:2181, --topic kafkavoltdb
In deployment.xml file the configuration is:
<security enabled="false" provider="hash"/>
<import>
<configuration type="kafka" enabled="true" format="csv">
<property name="topics">kafkavoltdb</property>
<property name="procedure">TEST.insert</property>
<property name="brokers">brokers:6667</property>
</configuration>
</import>
I am not able to fetch data from kafka to voltdb and the kafkaloader commands hungs up and not throwing any error. The logs showing :
Failed to get Kafka partition info
org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata.
Note: i am using apache kafka (HDP version 3.0 ,Kerberos security cluster)
Kindly help me with solution.

Related

timeout with kakfkaAppender in log4j?

When I use the KafkaAppender of log4j I have a problem when I put a single broker, but it is stopped. The problem is the KafkaAppender waits for a very long time before failing. I use syncsend=false I want to set some timeout so the appender wouldn't wait for such a long time.
Could you tell me how I need to configure the KafkaAppender in order to prevent this wait?
There is no timeout setting on KafkaAppender itself, but there are a few timeout options that can be configured on KafkaProducer. The options are described in the Kafka documentation.
Here you have example kafka appender configuration with two kafka producer timeout settings with their default values:
<Appenders>
<Kafka name="Kafka" topic="log-test">
<PatternLayout pattern="%date %message"/>
<Property name="bootstrap.servers">localhost:9092</Property>
<Property name="request.timeout.ms">30000</Property><!-- 30 seconds -->
<Property name="transaction.timeout.ms">60000</Property><!-- 1 minute -->
</Kafka>
</Appenders>
You might want to play with those to get the expected behaviour.
Also, remember that the syncSend option was added in log4j 2.8 version. If you use older version it will have no effect.

Log4j2 kafka appender failover handling when all of Kafka Brokers are unavailable?

I am sending all my application logs to the Kafka using Log4j2 Kafka appender and it works. But in a situation, when I purposefully bring down the broker, the application gets hung-up and the kafka appender keeps on retrying to establish the connection.
How can I stop writing into Kafka when the broker(s) are down? and resume once it is available?
Following is the appender configuration I have used.
<Kafka name="KafkaServiceStatInfo" topic="testKafkaLogs">
<PatternLayout pattern="%m"/>
<Property name="bootstrap.servers">localhost:9092</Property>
<Property name="acks">0</Property>
</Kafka>
<Async name="Async">
<AppenderRef ref="KafkaServiceStatInfo"/>
</Async>

Redelivering JMS messages from the DLQ

I have two components communicating over an jms queue in a wildfly instance. As soon as the consumer of the queue disconnects or gets stopped, the messages are forwarded to the DLQ (at least when wildfly is restarted).
Is it possible to configure wildfly to automatically redeliver the messages from DLQ as soon as a consumer reconnects to the queue?
Some details
Wildfly version: 8.2.0
standalone.xml - As far as I can tell, nothing special
<jms-destinations>
<jms-queue name="ExpiryQueue">
<entry name="java:/jms/queue/ExpiryQueue"/>
<durable>false</durable>
</jms-queue>
<jms-queue name="DLQ">
<entry name="java:/jms/queue/DLQ"/>
<durable>false</durable>
</jms-queue>
...
<jms-queue name="Q1-Producer-to-Consumer">
<entry name="java:/queue/Q1-Producer-to-Consumer"/>
<entry name="java:jboss/exported/queue/Q1-Producer-to-Consumer"/>
<durable>false</durable>
</jms-queue>
</jms-destinations>
Thanks.
The DLQ only gets messages that have thrown an exception during message processing. If a consumer disconnects, the messages will just still be sitting there awaiting delivery
If you are seeing an Issue whereby during a server restart messages hit the DLQ, this would suggest that your consumer is consuming messages before the resources it requires are available, so is erroring when processing the messages. You would be better to fix your consumer to not start consuming messages to early, rather than trying to fish the failed messages back from DLQ

Kafka connect (sink file not updated)

I have updated below properties file according to my requirement
connect-standalone.properties
connect-file-source.properties
connect-file-sink.properties
The Kafka Connect process start and the source connector read lines and write these as messages to the test_topic but messages are not written in test.sink.txt
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties config/connect-file-sink.properties

Unable to save dataframe in redshift

I'm reading large dataset form hdfs location and saving my dataframe into redshift.
df.write
.format("com.databricks.spark.redshift")
.option("url", "jdbc:redshift://redshifthost:5439/database?user=username&password=pass")
.option("dbtable", "my_table_copy")
.option("tempdir", "s3n://path/for/temp/data")
.mode("error")
.save()
After some time i am getting following error
s3.amazonaws.com:443 failed to respond
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:143)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:223)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:685)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:487)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:334)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:281)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:1043)
at org.jets3t.service.impl.rest.httpclient.RestStorageService.copyObjectImpl(RestStorageService.java:2029)
at org.jets3t.service.StorageService.copyObject(StorageService.java:871)
at org.jets3t.service.StorageService.copyObject(StorageService.java:916)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.copy(Jets3tNativeFileSystemStore.java:323)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.rename(NativeS3FileSystem.java:707)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:370)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:384)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:326)
at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:230)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:
I found the same issue on github
s3.amazonaws.com:443 failed to respond
am i doing something wrong ?
help me plz
I had the same issue in my case I was using AWS EMR too.
Redshift databricks library using the Amazon S3 for efficiently transfer data in and out of RedshiftSpark.This library firstly write the data in Amazon S3 and than this avro files loaded into Redshift using EMRFS.
You have to configure your EMRFS setting and it will be work.
The EMR File System (EMRFS) and the Hadoop Distributed File System
(HDFS) are both installed on your EMR cluster. EMRFS is an
implementation of HDFS which allows EMR clusters to store data on
Amazon S3.
EMRFS will try to verify list consistency for objects tracked in its
metadata for a specific number of retries(emrfs-retry-logic). The default is 5. In the
case where the number of retries is exceeded the originating job
returns a failure. To overcome this issue you can override your
default emrfs configuration in the following steps:
Step1: Login your EMR-master instance
Step2: Add following properties to /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
sudo vi /usr/share/aws/emr/emrfs/conf/emrfs-site.xml
fs.s3.consistent.throwExceptionOnInconsistency
false
<property>
<name>fs.s3.consistent.retryPolicyType</name>
<value>fixed</value>
</property>
<property>
<name>fs.s3.consistent.retryPeriodSeconds</name>
<value>10</value>
</property>
<property>
<name>fs.s3.consistent</name>
<value>false</value>
</property>
And restart your EMR cluster
and also configure your hadoopConfiguration hadoopConf.set("fs.s3a.attempts.maximum", "30")
val hadoopConf = SparkDriver.getContext.hadoopConfiguration
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3a.attempts.maximum", "30")
hadoopConf.set("fs.s3n.awsAccessKeyId", awsAccessKeyId)
hadoopConf.set("fs.s3n.awsSecretAccessKey", awsSecretAccessKey)