IllegalAccessError when using DynamoDBMapper to encrypt data in EMR - scala

I followed this document: https://docs.aws.amazon.com/dynamodb-encryption-client/latest/devguide/java-examples.html and setup encryption client and mapper to encrypt an items and batchsave into Table but it is not working and throwing errors as given below
stack Trace details:
ERROR Client: Application diagnostics message: User class threw exception: java.lang.IllegalAccessError: tried to access class com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMappingsRegistry from class com.amazonaws.services.dynamodbv2.datamodeling.AttributeEncryptor at com.amazonaws.services.dynamodbv2.datamodeling.AttributeEncryptor.getModelClassMetadata(AttributeEncryptor.java:156) at com.amazonaws.services.dynamodbv2.datamodeling.AttributeEncryptor.transform(AttributeEncryptor.java:65) at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.transformAttributes(DynamoDBMapper.java:2180) at com.amazonaws.services.dynamodbv2.datamodeling.DynamoDBMapper.batchWrite(DynamoDBMapper.java:1229) at com.amazonaws.services.dynamodbv2.datamodeling.AbstractDynamoDBMapper.batchSave(AbstractDynamoDBMapper.java:193) at com.amazon.payrolldatalakeemr.awsoperations.DDBOperations$.batchSaveInDDB(DDBOperations.scala:40)
Config details:
AWSJavaSDKExternalRelease = 1.11.x;
# Spark dependencies
Spark-core = 2.2.1;
Spark-sql = 2.2.1;
DaxJavaClient = 1.0;
ANTLR-Runtime = 3.5.x;
DynamoDbGrammar = 1.0;
Lombok = 1.16.x;
LombokUtils = 1.1;
Maven-com-amazonaws_aws-dynamodb-encryption-java = 1.x;
Mapper code:
def getDynamoDBMapper(region: String): DynamoDBMapper = {
val cmkArn = "*****************************"
val kms: AWSKMS = AWSKMSClientBuilder.standard().withRegion(region).build()
val cmp: DirectKmsMaterialProvider = new DirectKmsMaterialProvider(kms, cmkArn)
val encryptor: DynamoDBEncryptor = DynamoDBEncryptor.getInstance(cmp)
val mapperConfig: DynamoDBMapperConfig = DynamoDBMapperConfig.builder.withSaveBehavior(DynamoDBMapperConfig.SaveBehavior.CLOBBER).build
new DynamoDBMapper(ddclient, mapperConfig, new AttributeEncryptor(encryptor))
}

Resolved after adding spark property:
--conf spark.driver.userClassPathFirst=true

Related

Unsupported authentication token, scheme='none' only allowed when auth is disabled: { scheme='none' } - Neo4j Authentication Error

I am trying to connect to Neo4j from Spark using neo4j-spark-connector. I am facing an authentication issue when I try to connect to the Neo4j org.neo4j.driver.v1.exceptions.AuthenticationException: Unsupported authentication token, scheme='none' only allowed when auth is disabled: { scheme='none' }
I have checked and the credentials I am passing are correct. Not sure why is it failing.
import org.neo4j.spark._
import org.apache.spark._
import org.graphframes._
import org.apache.spark.sql.SparkSession
import org.neo4j.driver.v1.GraphDatabase
import org.neo4j.driver.v1.AuthTokens
val config = new SparkConf()
config.set(Neo4jConfig.prefix + "url", "bolt://localhost")
config.set(Neo4jConfig.prefix + "user", "neo4j")
config.set(Neo4jConfig.prefix + "password", "root")
val sparkSession :SparkSession = SparkSession.builder.config(config).getOrCreate()
val neo = Neo4j(sparkSession.sparkContext)
val graphFrame = neo.pattern(("Person","id"),("KNOWS","null"), ("Employee","id")).partitions(3).rows(1000).loadGraphFrame
println("**********Graphframe Vertices Count************")
graphFrame.vertices.count
println("**********Graphframe Edges Count************")
graphFrame.edges.count
val pageRankFrame = graphFrame.pageRank.maxIter(5).run()
val ranked = pageRankFrame.vertices
ranked.printSchema()
val top3 = ranked.orderBy(ranked.col("pagerank").desc).take(3)
Can someone please have a look and let me know the reason for the same?
It might be a configuration issue with your neo4j.conf file. Is this line commented out:
dbms.security.auth_enabled=false
I had a similar problem, creating the following spring beans fixed the issue.
#Bean
public org.neo4j.ogm.config.Configuration getConfiguration() {
return new org.neo4j.ogm.config.Configuration.Builder()
.credentials("neo4j", "secret")
.uri("bolt://localhost:7687").build();
}
#Bean
public SessionFactory sessionFactory(org.neo4j.ogm.config.Configuration configuration) {
return new SessionFactory(configuration,
"<your base package>");
}

SSLHandshakeException happens during file upload to AWS S3 via Alpakka

I'm trying to setup an Alpakka S3 for files upload purpose. Here is my configs:
alpakka s3 dependency:
...
"com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.20"
...
Here is application.conf:
akka.stream.alpakka.s3 {
buffer = "memory"
proxy {
host = ""
port = 8000
secure = true
}
aws {
credentials {
provider = default
}
}
path-style-access = false
list-bucket-api-version = 2
}
File upload code example:
private val awsCredentials = new BasicAWSCredentials("my_key", "my_secret_key")
private val awsCredentialsProvider = new AWSStaticCredentialsProvider(awsCredentials)
private val regionProvider = new AwsRegionProvider { def getRegion: String = "us-east-1" }
private val settings = new S3Settings(MemoryBufferType, None, awsCredentialsProvider, regionProvider, false, None, ListBucketVersion2)
private val s3Client = new S3Client(settings)(system, materializer)
val fileSource = Source.fromFuture(ByteString("ololo blabla bla"))
val fileName = UUID.randomUUID().toString
val s3Sink: Sink[ByteString, Future[MultipartUploadResult]] = s3Client.multipartUpload("my_basket", fileName)
fileSource.runWith(s3Sink)
.map {
result => println(s"${result.location}")
} recover {
case ex: Exception => println(s"$ex")
}
When I run this code I get:
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
What can be a reason?
The certificate problem arises for bucket names containing dots.
You may switch to
akka.stream.alpakka.s3.path-style-access = true to get rid of this.
We're considering making it the default: https://github.com/akka/alpakka/issues/1152

WARN TaskMemoryManager:302 - Failed to allocate a page (16777216 bytes), try again

I am facing this error while running a spark job in standalone cluster mode.
I have following code in pyspark:
def join_client_info(self, cur_df):
raw_clrr_df = sqlContext.read.parquet(hdfs_path+'/data/life400/CLRRPF')\
.selectExpr(['CLNTNUM as cliNum',\
'CLRRROLE'])
salh_df = sqlContext.read.parquet(hdfs_path+'/data/life400/SALHPF')\
.selectExpr(['CLNTNUM as cliNum',\
'DECL_GR_SALARY as proposerSalary'])
spaceDeleteUDF = udf(lambda s: re.sub('[^A-Za-z0-9]+', "", s), StringType())
clrr_df = raw_clrr_df.withColumn('clientRole', spaceDeleteUDF(\
raw_clrr_df['CLRRROLE'])).drop('CLRRROLE')
cli_num = cur_df.select(['cliNum']).collect()[0]['cliNum']
number_of_pols_lf = clrr_df.filter('cliNum='+cli_num)\
.where(clrr_df['clientRole']=='LF')\
.count()
number_of_pols_ow = clrr_df.filter('cliNum='+cli_num)\
.where(clrr_df['clientRole']=='OW')\
.count()
with_lf_num_of_policies = cur_df.withColumn('numberOfPolsIn'\
,lit(number_of_pols_lf))
with_lf_ow_num_of_policies = with_lf_num_of_policies.withColumn(\
'numberOfPolsOw'\
,lit(number_of_pols_ow))
# print(cur_df)
with_proposer_sal = salh_df.filter('cliNum='+cli_num)
return with_lf_ow_num_of_policies.join(with_proposer_sal,'cliNum','inner')
If I uncomment the "print(cur_df)" line, it works fine, doesn`t give me an error. I find this behaviour weird. What am I missing here?

Scala - Flink Monitoring API (Upload Jobs)

Good Day, I have an Issue uploading Jobs to Flink API using Scala
All Get request seem to work
import scalaj.http._
val url: String = "http://127.0.0.1:8081"
val response: HttpResponse[String] = Http(url+"/config").asString
return response
When I try Uploading a JAR file through CURL (works)
curl -vvv -X POST -H "Expect:" -F "jarfile=#/home/Downloads/myJob.jar" http://127.0.0.1:8081/jars/upload
Now I would Like to upload using SCALA
The documentation does not provide a working example and I am fairly new to this type of post: https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/rest_api.html#submitting-programs
Currently my code is (Does not Work):
Taken from : https://github.com/Guru107/flinkjobuploadplugin/tree/master/src/main/java/com/github/guru107 - Edited to my needs
// Ideal Case is to upload a Jar File as a multipart in Scala
import java.io.IOException
import org.apache.http.client.methods.HttpPost
import org.apache.http.entity.mime.MultipartEntityBuilder
import org.apache.http.impl.client.{HttpClients, LaxRedirectStrategy}
import org.apache.http.message.BasicHeader
import org.apache.http.util.EntityUtils
val requestUrl = "http://localhost:8081/jars/upload"
val jarPath = "#/home/Downloads/myJob.jar"
val httpClient: CloseableHttpClient = HttpClients.custom.setRedirectStrategy(new LaxRedirectStrategy).build
val fileToUpload: File = new File(jarPath)
val uploadFileUrl: HttpPost = new HttpPost(requestUrl)
val builder: MultipartEntityBuilder = MultipartEntityBuilder.create
builder.addBinaryBody("jarfile", fileToUpload)
val multipart: HttpEntity = builder.build
var jobUploadResponse: JSONObject = null
uploadFileUrl.setEntity(multipart)
var response: CloseableHttpResponse = null
try {
response = httpClient.execute(uploadFileUrl)
println("response: " + response)
response.setHeader(new BasicHeader("Expect", ""))
response.setHeader(new BasicHeader("content-type", "application/x-java-archive"))
val bodyAsString = EntityUtils.toString(response.getEntity, "UTF-8")
println("bodyAsString: " + bodyAsString)
jobUploadResponse = new JSONObject(bodyAsString)
println("jobUploadResponse: " + jobUploadResponse)
}
It fails to upload file.
Please provide a working example or link of scala example to upload a job/jar file to flink in scala
Thanks in Advance
You can use the client code from com.github.mjreid.flinkwrapper
And upload jar file with scala code:
val apiEndpoint: String = as.settings.config.getString("flink.url") //http://<flink_web_host>:<flink_web_port>
val client = FlinkRestClient(apiEndpoint, as)
client.runProgram(<jarId>)

Accessing postgres using slick is not working

I have following environment scala2.11.8 / akka 2.4.8 / slick 3.1.1 / postgreSQL 9.6
I have done following configuration in application.conf
mydb {
driver = "slick.driver.PostgresDriver$"
db {
url = "jdbc:postgresql://localhost:5432/mydb"
driver = org.postgresql.Driver
user="postgres"
password="postgres"
numThreads = 10
connectionPool = disabled
keepAliveConnection = true
}
}
The DB access is done in class
package mib
import slick.driver.PostgresDriver.api._
import scala.concurrent.ExecutionContext.Implicits.global
class DBAccess {
import scala.concurrent.Future
import scala.concurrent._
import scala.concurrent.duration._
import slick.backend.DatabaseConfig
import slick.driver.JdbcProfile
import slick.driver.PostgresDriver
import slick.driver.PostgresDriver.api._
import slick.jdbc.JdbcBackend.Database
println("creating database")
val dbConfig: DatabaseConfig[PostgresDriver] = DatabaseConfig.forConfig("mydb")
val db = dbConfig.db
try{
val accesspoints = TableQuery[mibPoint]
// SELECT * FROM users WHERE username='john'
val q = for (a <- accesspoints) yield a.mib_id
val dbAction = q.result
val f: Future[Seq[String]] = db.run(dbAction)
Await.result(f, Duration.Inf)
f.onSuccess { case s => println(s"Result: $s") }
}
catch
{
case _: Throwable =>println("got some exception")
}
finally
db.close
}
// this is a class that represents the table I've created in the database
class mibPoint(tag: Tag) extends Table[(String, Double,Double)](tag, "mib_non_info") {
def mac_id = column[String]("mib_id",O.PrimaryKey)
def lat = column[Double]("lat")
def lng = column[Double]("lng")
def * = (mib_id, lat,lng)
}
This class is called from APP object as
object wmib extends App {
val mWBootStrapper = new bootStrap
mWBootStrapper.ReadProperties();
val mdB = new DBAccess
}
However after running, I always get the output as "got some exception"
I have tried to enable logging using slf4j/logback but still i do not see much in the logs.
The above seems like very trivial and probably i am missing something obvious.
Thanks in advance,
Vishal
I added the exception handling as suggested by sarvesh. That was cool and thank you.
However my problem vanished and there was no exception.
What happened?
Earlier in the day, I had attempted to access the DB using the java JDBC way.
i.e. just to check that there is nothing wrong with DB and DB access.
In the process, I downloaded and added the postgresDriver in the classpath. Earlier that was not the case.
Since the driver was now in the path, the code just worked.
Since I was not printing the exception, i was not realizing the error.
I then removed the driver jar AND i got the following error.
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.JdbcBackend.statement - Preparing statement: select "mib_id" from "mibpoint"
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.DriverDataSource - Driver org.postgresql.Driver not already registered; trying to load it
java.lang.ClassNotFoundException: org.postgresql.Driver
at java.lang.ClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at slick.util.ClassLoaderUtil$$anon$1.loadClass(ClassLoaderUtil.scala:12)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:60)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:58)
at scala.Option.getOrElse(Option.scala:121)
Thanks to all for helping.
Vishal
I was running into the same connection issues when first using Slick. I submitted this PR with details on how to connect up a local Postgres server.
https://github.com/slick/slick/issues/1861#issuecomment-387616310.
But basically try edit your build.sbt and application.conf files:
The 2020 answer:
You have to make sure of two things:
Add the driver to the build.sbt's libraryDependencies: "org.postgresql" % "postgresql" % "42.2.5". That will cause java.sql.DriverManager's method getDrivers (which is used by slick in class DriverDataSource) to find the driver org.postgresql.Driver
Make sure that the database url in application.conf is following the JDBC's full-url pattern, as described in the source code: https://github.com/slick/slick/blob/42d787b4950fe876569b5fd68e98c4e0379ac83c/slick/src/main/scala/slick/jdbc/DatabaseUrlDataSource.scala#L9. For example: postgresql://user:password#localhost:5432/postgres.
My full configuration is:
build.sbt
libraryDependencies ++= Seq(
...,
"org.postgresql" % "postgresql" % "42.2.5"
)
application.conf
slick-postgres {
profile = "slick.jdbc.PostgresProfile$"
db {
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties = {
driver = "org.postgresql.Driver"
url = "postgresql://postgres:postgres#localhost:5432/postgres"
}
}
}
I added the exception handling as suggested by sarvesh. That was cool and thank you. However my problem vanished and there was no exception. What happened? Earlier in the day, I had attempted to access the DB using the java JDBC way. i.e. just to check that there is nothing wrong with DB and DB access. In the process, I downloaded and added the postgresDriver in the classpath. Earlier that was not the case. Since the driver was now in the path, the code just worked. Since I was not printing the exception, i was not realizing the error. I then removed the driver jar AND i got the following error.
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.JdbcBackend.statement - Preparing statement: select "mib_id" from "mibpoint"
01:44:08.224 [mydb.db-1] DEBUG slick.jdbc.DriverDataSource - Driver org.postgresql.Driver not already registered; trying to load it
java.lang.ClassNotFoundException: org.postgresql.Driver
at java.lang.ClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at slick.util.ClassLoaderUtil$$anon$1.loadClass(ClassLoaderUtil.scala:12)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:60)
at slick.jdbc.DriverDataSource$$anonfun$init$2.apply(DriverDataSource.scala:58)
at scala.Option.getOrElse(Option.scala:121)
Thanks to all for helping. Vishal
mydb {
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties = {
driver = "slick.driver.PostgresDriver$"
url = "postgres://postgresql:postgresql#localhost:5432/mydb"
}
}
Or.. you can try something like,
mydb = {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
url = "jdbc:postgresql://localhost:5432/mydb"
user = "postgres"
password = "postgres"
}
numThreads = 10
}
You need the Postgres Driver on the classpath:
Try adding "org.postgresql" % "postgresql" % "42.1.4" to your libraryDependencies.