Configure log4j.properties for Kafka appender, error when parsing property bootstrap.servers - scala

I want to add a Kafka appender to the audit-hdfs log in a Cloudera cluster.
I have successfully configured a log4j2.xml file with a Kafka appender, I need to convert this log4j2.xml into a log4j2.properties file in order to be able to merge it with the HDFS log log4j2.properties file. I am unable to do this because when I launch my dummy process with log4j2.properties instead of XML, I get an error.
I have tried writing the properties file in several different ways, always resulting in problems with the bootstrap.servers property
This is my properties file
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = ALL
appenders = console,kafka
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %m%n
appender.kafka.type = Kafka
appender.kafka.name = kafka
appender.kafka.layout.type = PatternLayout
appender.kafka.layout.pattern =%m%n
appender.kafka.property.type = Property
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.topic = cdh-audit-hdfs
Here the problem is in this line:
appender.kafka.property.bootstrap.servers = ip:port
I have tried the following, to no avail:
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.bootstrap\.servers = ip:port
appender.kafka.property.name = "bootstrap.servers"
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.key = "bootstrap.servers"
appender.kafka.property.value = ip:port
etc...
This is my dummy process:
package blabla
import org.apache.logging.log4j.LogManager
object dummy extends App{
val logger = LogManager.getLogger
val record = "...c"
while(true){
logger.info(record)
Thread.sleep(5000)
}
}
How do I need to configure my log4j2.properties in order to be able to define this property?
Im expecting this process to write the record in my Kafka topic but instead I get errors like this:
Exception in thread "main" org.apache.logging.log4j.core.config.ConfigurationException: No type attribute provided for component bootstrap
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.

try this solution, it works for me :
appender.kafka.property.type=Property
appender.kafka.property.name=bootstrap.servers
appender.kafka.property.value=host:port

Related

Bridge startup error connecting to Zookeeper

Just getting the following error trying to setup the Bridge component using Zookeeper, according to the steps described in https://docs.corda.r3.com/website/releases/3.1/bridge-configuration-file.html?highlight=zookeeper.
> java -jar corda-bridgeserver-3.1.jar
BridgeSupervisorService: active = false
[ERROR] 20:59:31-0300 [main-EventThread] imps.EnsembleTracker.processConfigData - Invalid config event received: {server.1=10.102.32.104:2888:3888:participant, version=100000000, server.3=10.102.32.108:2888:3888:participant, server.2=10.102.32.107:2888:3888:participant}
[ERROR] 20:59:32-0300 [main-EventThread] imps.EnsembleTracker.processConfigData - Invalid config event received: {server.1=10.102.32.104:2888:3888:participant, version=100000000, server.3=10.102.32.108:2888:3888:participant, server.2=10.102.32.107:2888:3888:participant}
My bridge.conf:
bridgeMode = BridgeInner
outboundConfig {
artemisBrokerAddress = "10.102.32.97:10010"
alternateArtemisBrokerAddresses = [ "10.102.32.98:10010" ]
}
bridgeInnerConfig {
floatAddresses = ["10.102.32.103:12005", "10.102.32.105:12005"]
expectedCertificateSubject = "CN=Float Local,O=Local Only,L=London,C=GB"
customSSLConfiguration {
keyStorePassword = "bridgepass"
trustStorePassword = "trustpass"
sslKeystore = "./bridgecerts/bridge.jks"
trustStoreFile = "./bridgecerts/trust.jks"
crlCheckSoftFail = true
}
}
haConfig {
haConnectionString = "zk://10.102.32.104:2181,zk://10.102.32.107:2181,zk://10.102.32.108:2181"
}
networkParametersPath = ./network-parameters
Any thoughts?
This error is harmless. It indicates that the Dockerised Zookeeper has bad IP addresses, so when the Apache Curator is sent the dynamic topology, some checks fail. It does not invalidate the static configuration and everything should work fine.
Note that as of Corda Enterprise 3.2, you must use the Zookeeper version that is compatible with the Apache Curator library, which is 3.5.3-beta, and NOT the latest version.

Unable to publish in Kafka

I want to publish in Kafka topic
I am unable to do so, the program halts.
I am getting this error:
KafkaTimeoutError: Failed to update metadata after 60.0 secs.
def saveResults(response):
entities_tweet = response["entities"]
for entity in entities_tweet:
try:
for i in entity_dict:
for j in entity_dict[i]:
if(entity["text"] in j):
entity["tweet"] = response["tweet"]
entity["tweetId"] = response["tweetId"]
entity["timeStamp"] = response["timeStamp"]
#entity["userProfile"] = response["userProfile"]
future = producer.send('argentina-iceland-june-16-watson', bytes(entity))
print("Published.")
else:
print("All ignored.")
future = producer.send('argentina-iceland-june-16-watson', bytes(entity))
print("Published")
except Exception as e:
print (e)
finally:
producer.flush()
However, this is working:
from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=['broker1:1234'])
# Asynchronous by default
future = producer.send('my-topic', b'raw_bytes')
It looks like that you're using incorrect boostrap server, it should be broker1:9092 instead of broker1:1234...

Replacement for SimpleConsumer.send in kafka?

SimpleConsumer has been deprecated in kafka, with org.apache.kafka.clients.consumer.KafkaConsumer being the replacement. However, it doesn't have a send(...) function. How can I rewrite the below code using the new KafkaConsumer?
import scala.concurrent.duration._
import kafka.api.TopicMetadataRequest
import kafka.consumer.SimpleConsumer
....
val consumer = new SimpleConsumer(
host = "127.0.0.1",
port = 9092,
soTimeout = 2.seconds.toMillis.toInt,
bufferSize = 1024,
clientId = "health-check")
// this will fail if Kafka is unavailable
consumer.send(new TopicMetadataRequest(Nil, 1))
You can get topic metadata with .partitionsFor and .listTopics
There is no direct replacement for method ,it depends what you want to do .
If you need all partitions info there is method for that consumer.partitionFor(topic) in the new api

How to use Spark BigQuery Connector locally?

For test purpose, I would like to use BigQuery Connector to write Parquet Avro logs in BigQuery. As I'm writing there is no way to read directly Parquet from the UI to ingest it so I'm writing a Spark job to do so.
In Scala, for the time being, job body is the following:
val events: RDD[RichTrackEvent] =
readParquetRDD[RichTrackEvent, RichTrackEvent](sc, googleCloudStorageUrl)
val conf = sc.hadoopConfiguration
conf.set("mapred.bq.project.id", "myproject")
// Output parameters
val projectId = conf.get("fs.gs.project.id")
val outputDatasetId = "logs"
val outputTableId = "test"
val outputTableSchema = LogSchema.schema
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, outputTableId, outputTableSchema
)
conf.set(
"mapreduce.job.outputformat.class",
classOf[BigQueryOutputFormat[_, _]].getName
)
events
.mapPartitions {
items =>
val gson = new Gson()
items.map(e => gson.fromJson(e.toString, classOf[JsonObject]))
}
.map(x => (null, x))
.saveAsNewAPIHadoopDataset(conf)
As the BigQueryOutputFormat isn't finding the Google Credentials, it fallbacks on metadata host to try to discover them with the following stacktrace:
016-06-13 11:40:53 WARN HttpTransport:993 - exception thrown while executing request
java.net.UnknownHostException: metadata
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:160)
at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:207)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:72)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.createBigQueryCredential(BigQueryFactory.java:81)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQuery(BigQueryFactory.java:101)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQueryHelper(BigQueryFactory.java:89)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputCommitter.<init>(BigQueryOutputCommitter.java:70)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:102)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:84)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:30)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1135)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1078)
It is of course expected but it should be able to use my service account and its key as GoogleCredential.getApplicationDefault() returns appropriate credentials fetched from GOOGLE_APPLICATION_CREDENTIALS environment variable.
As the connector seems to read credentials, from hadoop configuration, what's the keys to set so that it reads GOOGLE_APPLICATION_CREDENTIALS ? Is there a way to configure the output format to use a provided GoogleCredential object ?
If I understand your question correctly - you might want to set:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.email</name>
<name>mapred.bq.auth.service.account.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Here, the mapred.bq.auth.service.account.keyfile should point to the full file path to the older-style "P12" keyfile; alternatively, if you're using the newer "JSON" keyfiles, you should replace the "email" and "keyfile" entries with the single mapred.bq.auth.service.account.json.keyfile key:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.json.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Also you might want to take a look at https://github.com/spotify/spark-bigquery - which is much more civilised way of working with BQ and Spark. The setGcpJsonKeyFile method used in this case is the same JSON file you'd set for mapred.bq.auth.service.account.json.keyfile if using the BQ connector for Hadoop.

AWS: InvalidSignature exception while adding record

InvalidSignatureException occurs when trying to add user record using Kinesis Producer library.
AWS_JAVA_SDK_VERSION=1.10.26
AWS_KINESIS_PRODUCER_VERSION=0.10.1
ERROR:
PutRecords failed: {"__type":"InvalidSignatureException","message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method.
SCALA KINESIS PRODUCER CODE
private val configuration: KinesisProducerConfiguration = new KinesisProducerConfiguration
val credentialsProvider: AWSCredentialsProvider = AwsUtil.getAwsCredentials(config.awsAccessKey, config.awsSecretKey)
configuration.setCredentialsProvider(credentialsProvider)
configuration.setRecordMaxBufferedTime(config.timeLimit)
configuration.setAggregationMaxCount(1)
configuration.setRegion(config.streamRegion)
configuration.setMetricsLevel("none")
private val kinesisProducer = new KinesisProducer(configuration)
kinesisProducer.addUserRecord(streamName, key, eventBytes)`
The above code is not working. But its possible for me to add records to kinesis stream through aws cli from terminal and KinesisClient in code which is specified below.
private def createKinesisClient = {
val accessKey = config.awsAccessKey
val secretKey = config.awsSecretKey
val credentialsProvider: AWSCredentialsProvider = AwsUtil.getAwsCredentials(accessKey, secretKey)
val client = new AmazonKinesisClient(credentialsProvider)
client.setEndpoint(config.streamEndpoint)
client
}
This happens because your VM/PC/Server clock might by skewed.
If you're running ubuntu, try updating your system time:
sudo ntpdate ntp.ubuntu.com
If you are using docker-machine on Mac, you can resolve with this command:
docker-machine ssh default 'sudo ntpclient -s -h pool.ntp.org'