Presto on AWS EMR: access via Hue - classpath

In a Hue notebook (AWS EMR v5.5), when trying to use Presto, a CLASSPATH error is encountered.
Logs:
File "/usr/lib/hue/build/env/lib64/python2.7/UserDict.py", line 40, in __getitem__
raise KeyError(key)
KeyError: 'CLASSPATH'
Information about exporting CLASSPATH to avoid this error is provided here:
Integrating JDBC-compatible databases
But the error is still encountered (after export CLASSPATH and restart of Hue service on the Master node). Anyone encounter this and find a fix? Please share.

This sounds like the same issue as https://issues.cloudera.org/browse/HUE-5859

Related

td-agent does not validate google cloud service account credentials

Trying to configure fluentd output with td-agent and the fluent-google-cloud plugin. The plugin and all dependencies are loaded but fluentd is not outputting to google cloud logging and the td-agent log states error="Unable to read the credential file specified by GOOGLE_APPLICATION_CREDENTIALS: file /home/$(whoami)/.config/gcloud/service_account_credentials.json does not exist".
However when I go to the file path, the file does exist and the $GOOGLE_APPLICATION_CREDENTIALS variable is set to the file path as well.What should I do to fix this?
On the assumption that the error and you are both correct, I suspect (!) that you're using your user account ( == whoami) and finding /home/$(whoami)/.config/gcloud while the agent is running (under systemctl?) as root and not finding the credentials file there (perhaps /root/.config/gcloud.
It would be helpful if you included more details as to what you've done in order that we can better understand the issue.

Presto 313 password-authenticator. properties doesn't work with file

I'm running presto in K8s and I'm trying to enable file based authentication for the service (using this as a guide https://prestosql.io/docs/current/security/password-file.html). However, as the application is starting up I get an error saying:
java.lang.IllegalStateException: Password authenticator file is not registered
config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=20GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
discovery-server.enabled=true
discovery.uri=http://presto-service.eap.svc.cluster.local:8080
http-server.authentication.type=PASSWORD
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/opt/presto-server/etc/presto.jks
http-server.https.keystore.key=*************
password-authenticator.properties
password-authenticator.name=file
file.password-file=/opt/presto-server/etc/password.db
The rest of the config looks perfectly sane so does anybody know what I might have missed here?
Thanks,
Password file authentication was added in version 327, so you need to upgrade it.
Security Changes
Add Password File Authentication. (#797)
You can get the latest version from https://prestosql.io/download.html.
Also, you can join the community Slack. https://prestosql.io/slack.html

Error Could not initialize class net.sf.jasperreports.engine.util.JRStyledTextParser on Kubernetes container

After implementing jasperreport 6.0.0 on kubernetes I am getting below error
Could not initialize class net.sf.jasperreports.engine.util.JRStyledTextParser
I have tried multiple solution but still the error is NOT getting resolved.
Attempt 1: setting -Djava.awt.headless=true
I have implemented this property in YAML file as metioned on
how to add CATALINA_OPTS=-Djava.awt.headless=true this property in Kubernetes configuration
but it did not work.
I tried setting in system property as well System.setProperty("java.awt.headless", "true") but his also did not worked.
Attempt 2: Including missing dependency for xml-apis. It also did not work.
Attempt based on this thread.
Can someone please suggest any solution? The report is working absolutely fine in local environment but getting failed on server. The application is deployed on Kubernetes container

Confluent Kafka GCS(Google Cloud Storage) Connector giving a parse error on loading the properties file

I'm new to Confluent and I am trying to use GCS as the sink after exporting data from Kafka. I am following this guide: https://docs.confluent.io/current/connect/kafka-connect-gcs/index.html
I get the following error when I tried to start the connector
This CLI is intended for development only, not for production
https://docs.confluent.io/current/cli/index.html
parse error: Expected separator between values at line 1, column 644
{
"error_code": 500,
"message": null
}
I've searched for a solution but can't seem to find one. Any help would be greatly appreciated!
if someone runs into it in the future. Just remove the double quotes "" assigned to confluent.license

Spark Scala S3 storage: permission denied

I've read a lot of topic on Internet on how to get working Spark with S3 still there's nothing working properly.
I've downloaded : Spark 2.3.2 with hadoop 2.7 and above.
I've copied only some libraries from Hadoop 2.7.7 (which matches Spark/Hadoop version) to Spark jars folder:
hadoop-aws-2.7.7.jar
hadoop-auth-2.7.7.jar
aws-java-sdk-1.7.4.jar
Still I can't use nor S3N nor S3A to get my file read by spark:
For S3A I have this exception:
sc.hadoopConfiguration.set("fs.s3a.access.key","myaccesskey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecretkey")
val file = sc.textFile("s3a://my.domain:8080/test_bucket/test_file.txt")
com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: Amazon S3, AWS Request ID: AE203E7293ZZA3ED, AWS Error Code: null, AWS Error Message: Forbidden
Using this piece of Python, and some more code, I can list my buckets, list my files, download files, read files from my computer and get file url.
This code gives me the following file url:
https://my.domain:8080/test_bucket/test_file.txt?Signature=%2Fg3jv96Hdmq2450VTrl4M%2Be%2FI%3D&Expires=1539595614&AWSAccessKeyId=myaccesskey
How should I install / set up / download to get spark able to read and write from my S3 server ?
Edit 3:
Using debug tool in comment here's the result.
Seems like the issue is with a signature thing not sure what it means.
First you will need to download aws-hadoop.jar and aws-java-sdk.jar that matches the install of your spark-hadoop release and add them to the jars folder inside spark folder.
Then you will need to precise the server you will use and enable path style if your S3 server do not support dynamic DNS:
sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
#I had to change signature version because I have an old S3 api implementation:
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")
Here's my final code:
sc.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
val tmp = sc.textFile("s3a://test_bucket/test_file.txt")
sc.hadoopConfiguration.set("fs.s3a.access.key","mykey")
sc.hadoopConfiguration.set("fs.s3a.secret.key","mysecret")
sc.hadoopConfiguration.set("fs.s3a.endpoint","my.domain:8080")
sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled","true")
sc.hadoopConfiguration.set("fs.s3a.path.style.access","true")
sc.hadoopConfiguration.set("fs.s3a.signing-algorithm","S3SignerType")
tmp.count()
I would recommand to put most of the settings inside spark-defaults.conf:
spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.path.style.access true
spark.hadoop.fs.s3a.endpoint mydomain:8080
spark.hadoop.fs.s3a.connection.ssl.enabled true
spark.hadoop.fs.s3a.signing-algorithm S3SignerType
One of the issue I had has been to set spark.hadoop.fs.s3a.connection.timeout to 10 but this value is set in millisecond prior to Hadoop 3 and it gives you a very long timeout; error message would appear 1.5 minute after the attempt to read a file.
PS:
Special thanks to Steve Loughran.
Thank you a lot for the precious help.