Confluent Kafka GCS(Google Cloud Storage) Connector giving a parse error on loading the properties file - apache-kafka

I'm new to Confluent and I am trying to use GCS as the sink after exporting data from Kafka. I am following this guide: https://docs.confluent.io/current/connect/kafka-connect-gcs/index.html
I get the following error when I tried to start the connector
This CLI is intended for development only, not for production
https://docs.confluent.io/current/cli/index.html
parse error: Expected separator between values at line 1, column 644
{
"error_code": 500,
"message": null
}
I've searched for a solution but can't seem to find one. Any help would be greatly appreciated!

if someone runs into it in the future. Just remove the double quotes "" assigned to confluent.license

Related

AWS Glue job throwing Null pointer exception when writing df

I am trying to write a job to read data from S3 and write to BQ db (using connector), running the same script for other tables and it is working correctly, but for one of the tables the write is not working.
It is working on the first run, but after first load the incremental runs throws this null pointer exception error. I have bookmarks enabled to fetch new data added in S3 and write to BQ database.
I am already handling the new data check, if there are files to process then proceed else abort job.
In the job logs df is printing and count is printing too, everything seems to be working but as it runs the write df command the job fails.
I am not sure what is the cause. Had tried to make the nullability of source and target to be same too, by setting the nullable property of source to True same as target, but it still fails.
Unable to understand the null pointer exception thrown.
Error: Caused by: java.lang.NullPointerException at com.google.cloud.bigquery.connector.common.BigQueryClient.loadDataIntoTable(BigQueryClient.java:532) at com.google.cloud.spark.bigquery.BigQueryWriteHelper.loadDataToBigQuery(BigQueryWriteHelper.scala:87) at com.google.cloud.spark.bigquery.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.scala:66) ... 42 more
The BQ connector by AWS had a bug. This was resolved when I contacted the AWS team and they suggested to use previous version of the connector.
So, using previous version of connector helped me resolve the issue.

Apache-Kafka: Issue in creating topic using windows cmd

I am new to Apache-Kafka.I am using:
zookeeper-3.6.0
kafka-2.13-2.4.1
Windows-7
Earlier i was able to create and list topics on the same machine.
After i restarted my system, i am unable to create and list topics.
I am getting below errors.
Create error:
Topic create error
List error:
List Topic Error
The error looks pretty simple. I googled a lot but unfortunately, i am not able to get a solution to this.
I followed the below link for configurations:
http://programming-tips.in/kafka-set-up-apache-kafka-on-windows/
Looking forward for some expert advice.
Type it out manually and it will work.
There is something wrong with the '-' in the example code.

How to connect Ksql with ibm-cloud event-stream?

we created a project with ibm functions and event-streams in IBM Cloud.
Now, I am trying to connect KSQL with IBM cloud Event Stream, and I am following along the Document for getting basic ideas of integration.
By following the instructions, I created a file called ksql-server.properties and modified bootstrap.servers, username, password according to my credentials. Then I ran ksql http://localhost:8088 --config-file ksql-server.properties with ksql local cli. I assume everying runs correctly so far since the ksql> shows in the front of every new line...
Then I decided to check if the ksql connected with my ibm cloud by running SHOW topics;
Turns out some error lines:
`Error issuing POST to KSQL server. path:ksql'`
`Caused by: com.fasterxml.jackson.databind.JsonMappingException: Failed to set 'ssl.protocol' to 'TLSv1.2' (through reference chain: io.confluent.ksql.rest.entity.KsqlRequest["streamsProperties"])`
`Caused by: Failed to set 'ssl.protocol' to 'TLSv1.2' (through reference chain: io.confluent.ksql.rest.entity.KsqlRequest["streamsProperties"])
`
`Caused by: Failed to set 'ssl.protocol' to 'TLSv1.2'`
`Caused by: Cannot override property 'ssl.protocol'`
Also, I am quick lost at step 4 when it tells me to:
`Then start DataGen twice as follows:
i. With bootstrap-server=HOSTNAME:PORTNUMBER quickstart=users format=json topic=users maxInterval=10000 to start creating users events.
ii. With bootstrap-server=HOSTNAME:PORTNUMBER quickstart=pageviews format=delimited topic=pageviews maxInterval=10000 to start creating pageviews events.`
Is there anyone have done this before or would love to help me out? Thank you very much!!!
The IBM document is very out of date. KSQL runs as a client/server. The server needs to be run with the details of the broker, and then you can connect to it with a client, including the CLI, REST API, or web interface provided by Confluent Control Center.
So you need to run the KSQL server using your properties file:
./bin/ksql-server-start ksql-server.properties
and then connect to it with the CLI (for example):
./bin/ksql http://localhost:8088
See https://docs.confluent.io/current/ksql/docs/installation/installing.html for more information.

Spark Scala S3 storage: permission denied

I've read a lot of topic on Internet on how to get working Spark with S3 still there's nothing working properly.
I've downloaded : Spark 2.3.2 with hadoop 2.7 and above.
I've copied only some libraries from Hadoop 2.7.7 (which matches Spark/Hadoop version) to Spark jars folder:
hadoop-aws-2.7.7.jar
hadoop-auth-2.7.7.jar
aws-java-sdk-1.7.4.jar
Still I can't use nor S3N nor S3A to get my file read by spark:
For S3A I have this exception:
sc.hadoopConfiguration.set("fs.s3a.access.key","myaccesskey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecretkey")
val file = sc.textFile("s3a://my.domain:8080/test_bucket/test_file.txt")
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: AE203E7293ZZA3ED, AWS Error Code: null, AWS Error Message: Forbidden
Using this piece of Python, and some more code, I can list my buckets, list my files, download files, read files from my computer and get file url.
This code gives me the following file url:
https://my.domain:8080/test_bucket/test_file.txt?Signature=%2Fg3jv96Hdmq2450VTrl4M%2Be%2FI%3D&Expires=1539595614&AWSAccessKeyId=myaccesskey
How should I install / set up / download to get spark able to read and write from my S3 server ?
Edit 3:
Using debug tool in comment here's the result.
Seems like the issue is with a signature thing not sure what it means.
First you will need to download aws-hadoop.jar and aws-java-sdk.jar that matches the install of your spark-hadoop release and add them to the jars folder inside spark folder.
Then you will need to precise the server you will use and enable path style if your S3 server do not support dynamic DNS:
sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
#I had to change signature version because I have an old S3 api implementation:
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")
Here's my final code:
sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val tmp = sc.textFile("s3a://test_bucket/test_file.txt")
sc.hadoopConfiguration.set("fs.s3a.access.key","mykey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecret")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled","true")
sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")
tmp.count()
I would recommand to put most of the settings inside spark-defaults.conf:
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.fs.s3a.endpoint mydomain:8080
spark.hadoop.fs.s3a.connection.ssl.enabled true
spark.hadoop.fs.s3a.signing-algorithm S3SignerType
One of the issue I had has been to set spark.hadoop.fs.s3a.connection.timeout to 10 but this value is set in millisecond prior to Hadoop 3 and it gives you a very long timeout; error message would appear 1.5 minute after the attempt to read a file.
PS:
Special thanks to Steve Loughran.
Thank you a lot for the precious help.

Presto on AWS EMR: access via Hue

In a Hue notebook (AWS EMR v5.5), when trying to use Presto, a CLASSPATH error is encountered.
Logs:
File "/usr/lib/hue/build/env/lib64/python2.7/UserDict.py", line 40, in __getitem__
raise KeyError(key)
KeyError: 'CLASSPATH'
Information about exporting CLASSPATH to avoid this error is provided here:
Integrating JDBC-compatible databases
But the error is still encountered (after export CLASSPATH and restart of Hue service on the Master node). Anyone encounter this and find a fix? Please share.
This sounds like the same issue as https://issues.cloudera.org/browse/HUE-5859