Alpakka S3 library for "assume role"

Alpakka S3 library for "assume role" - scala

I am trying to push records to AWS S3 with Aplakka S3 library. The issue is due to security issues, I have to "assume role", and my IAM user doesn't have access to PUT. with aws cli, I could have successfully pushed to s3 with --profile parameter in aws s3 --profile <profile>. I want to know HOW TO ASSUME ROLE IN ALPAKKA S3 LIBRARY?
my application.conf file has the credentials as in https://github.com/akka/alpakka/blob/v3.0.4/s3/src/main/resources/reference.conf
:
alpakka.s3 {
# default values for AWS configuration
aws {
# to use the same configuration as if credentials.provider = default
credentials {
# static credentials
provider = default //static
access-key-id = <> //valid access key exists in original code
secret-access-key = <>
}
and my PUTing code is:
object Experiments extends App{
//AWS S3 configs
val s3BucketName = config.getString("alpakka.s3.bucket_name")
val s3BucketRegion = config.getString("alpakka.s3.bucket_region")
val bucket_path = config.getString("alpakka.s3.path_inside_bucket")
Source.single(record)
.map { Record => println("Record value: " + Record)
val K_record = //some parsing for the record into JSON
//push to S3
val bucketKey = s"$bucket_path/${K_record.session_id}.json"
try{
Source.single(ByteString(K_record.featureSet))
.runWith(S3.multipartUpload(bucket = s3BucketName, key = bucketKey))
}
catch{
case e2:Exception => println(s"S3 pushing error: ${e2.getMessage}")
}
}
.runWith(Sink.ignore)

Related

How to specify HTTP authentication (user, password) when using an URL for libvirt_volume.source

I am trying to Provision VMs with Terraform.
i give the source image for the vm
here:
source = "http://10.1.1.160/Builds/14.7.1.10_0.39637/output/KVM/14.7.1.10_0.39637-disk1.qcow2"
but this site require a user name and a password.
where and how can i specify the credentials of this site in my tf file?
this is my main.tf file:
terraform {
required_providers {
libvirt = {
source = "dmacvicar/libvirt"
}
}
}
provider "libvirt" {
uri = "qemu:///system"
}
resource "libvirt_volume" "centos7-qcow2" {
name = "centos7.qcow2"
pool = "default"
source = "http://10.1.1.160/Builds/14.7.1.10_0.39637/output/KVM/14.7.1.10_0.39637-disk1.qcow2"
format = "qcow2"
}
# Define KVM domain to create
resource "libvirt_domain" "gw" {
name = "gw"
memory = "8192"
vcpu = 4
network_interface {
network_name = "default"
}
disk {
volume_id = "${libvirt_volume.centos7-qcow2.id}"
}
console {
type = "pty"
target_type = "serial"
target_port = "0"
}
graphics {
type = "spice"
listen_type = "address"
autoport = true
}
}
when i run terraform apply i got this error:
*Error while determining image type for http://10.1.1.160/Builds/14.7.1.10_0.39637/output/KVM/14.7.1.10_0.39637-disk1.qcow2: Can't retrieve partial header of resource to determine file type: http://10.1.1.160/Builds/14.7.1.10_0.39637/output/KVM/14.7.1.10_0.39637-disk1.qcow2 - 401 Authorization Required
with libvirt_volume.centos7-qcow2,
on main.tf line 13, in resource "libvirt_volume" "centos7-qcow2":
13: resource "libvirt_volume" "centos7-qcow2" {*
thanks for helping!

I just added the password and the user to the URL, like this:
http://user:password#host/path
:)

Extract Embedded AWS Glue Connection Credentials Using Scala

I have a glue job that reads directly from redshift, and to do that, one has to provide connection credentials. I have created an embedded glue connection and can extract the credentials with the following pyspark code. Is there a way to do this in Scala?
glue = boto3.client('glue', region_name='us-east-1')
response = glue.get_connection(
Name='name-of-embedded-connection',
HidePassword=False
)
table = spark.read.format(
'com.databricks.spark.redshift'
).option(
'url',
'jdbc:redshift://prod.us-east-1.redshift.amazonaws.com:5439/db'
).option(
'user',
response['Connection']['ConnectionProperties']['USERNAME']
).option(
'password',
response['Connection']['ConnectionProperties']['PASSWORD']
).option(
'dbtable',
'db.table'
).option(
'tempdir',
's3://config/glue/temp/redshift/'
).option(
'forward_spark_s3_credentials', 'true'
).load()

There is no scala equivalent from AWS to issue this API call.But you can use Java SDK code inside scala as mentioned in this answer.
This is the Java SDK call for getConnection and if you don't want to do this then you can follow below approach:
Create AWS Glue python shell job and retrieve the connection information.
Once you have the values then call the other scala Glue job with these as arguments inside your python shell job as shown below :
glue = boto3.client('glue', region_name='us-east-1')
response = glue.get_connection(
Name='name-of-embedded-connection',
HidePassword=False
)
response = client.start_job_run(
JobName = 'my_scala_Job',
Arguments = {
'--username': response['Connection']['ConnectionProperties']['USERNAME'],
'--password': response['Connection']['ConnectionProperties']['PASSWORD'] } )
Then access these parameters inside your scala job using getResolvedOptions as shown below:
import com.amazonaws.services.glue.util.GlueArgParser
val args = GlueArgParser.getResolvedOptions(
sysArgs, Array(
"username",
"password")
)
val user = args("username")
val pwd = args("password")

SSLHandshakeException happens during file upload to AWS S3 via Alpakka

I'm trying to setup an Alpakka S3 for files upload purpose. Here is my configs:
alpakka s3 dependency:
...
"com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.20"
...
Here is application.conf:
akka.stream.alpakka.s3 {
buffer = "memory"
proxy {
host = ""
port = 8000
secure = true
}
aws {
credentials {
provider = default
}
}
path-style-access = false
list-bucket-api-version = 2
}
File upload code example:
private val awsCredentials = new BasicAWSCredentials("my_key", "my_secret_key")
private val awsCredentialsProvider = new AWSStaticCredentialsProvider(awsCredentials)
private val regionProvider = new AwsRegionProvider { def getRegion: String = "us-east-1" }
private val settings = new S3Settings(MemoryBufferType, None, awsCredentialsProvider, regionProvider, false, None, ListBucketVersion2)
private val s3Client = new S3Client(settings)(system, materializer)
val fileSource = Source.fromFuture(ByteString("ololo blabla bla"))
val fileName = UUID.randomUUID().toString
val s3Sink: Sink[ByteString, Future[MultipartUploadResult]] = s3Client.multipartUpload("my_basket", fileName)
fileSource.runWith(s3Sink)
.map {
result => println(s"${result.location}")
} recover {
case ex: Exception => println(s"$ex")
}
When I run this code I get:
javax.net.ssl.SSLHandshakeException: General SSLEngine problem
What can be a reason?

The certificate problem arises for bucket names containing dots.
You may switch to
akka.stream.alpakka.s3.path-style-access = true to get rid of this.
We're considering making it the default: https://github.com/akka/alpakka/issues/1152

No FileSystem for scheme: cos

I'm trying to connect to IBM Cloud Object Storage from IBM Data Science Experience:
access_key = 'XXX'
secret_key = 'XXX'
bucket = 'mybucket'
host = 'lon.ibmselect.objstor.com'
service = 'mycos'
sqlCxt = SQLContext(sc)
hconf = sc._jsc.hadoopConfiguration()
hconf.set('fs.cos.myCos.access.key', access_key)
hconf.set('fs.cos.myCos.endpoint', 'http://' + host)
hconf.set('fs.cose.myCos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')
obj = 'mydata.tsv.gz'
rdd = sc.textFile('cos://{0}.{1}/{2}'.format(bucket, service, obj))
print(rdd.count())
This returns:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No FileSystem for scheme: cos
I'm guessing I need to use the 'cos' scheme based on the stocator docs. However, the error suggests stocator isn't available or is an old version?
Any ideas?
Update 1:
I have also tried the following:
sqlCxt = SQLContext(sc)
hconf = sc._jsc.hadoopConfiguration()
hconf.set('fs.cos.impl', 'com.ibm.stocator.fs.ObjectStoreFileSystem')
hconf.set('fs.stocator.scheme.list', 'cos')
hconf.set('fs.stocator.cos.impl', 'com.ibm.stocator.fs.cos.COSAPIClient')
hconf.set('fs.stocator.cos.scheme', 'cos')
hconf.set('fs.cos.mycos.access.key', access_key)
hconf.set('fs.cos.mycos.endpoint', 'http://' + host)
hconf.set('fs.cos.mycos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')
service = 'mycos'
obj = 'mydata.tsv.gz'
rdd = sc.textFile('cos://{0}.{1}/{2}'.format(bucket, service, obj))
print(rdd.count())
However, this time the response was:
Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.io.IOException: No object store for: cos
at com.ibm.stocator.fs.ObjectStoreVisitor.getStoreClient(ObjectStoreVisitor.java:121)
...
Caused by: java.lang.ClassNotFoundException: com.ibm.stocator.fs.cos.COSAPIClient

The latest version of Stocator (v1.0.9) that supports fs.cos scheme is not yet deployed on Spark aaService (It will be soon). Please use the stocator scheme "fs.s3d" to connect to your COS.
Example:
endpoint = 'endpointXXX'
access_key = 'XXX'
secret_key = 'XXX'
prefix = "fs.s3d.service"
hconf = sc._jsc.hadoopConfiguration()
hconf.set(prefix + ".endpoint", endpoint)
hconf.set(prefix + ".access.key", access_key)
hconf.set(prefix + ".secret.key", secret_key)
bucket = 'mybucket'
obj = 'mydata.tsv.gz'
rdd = sc.textFile('s3d://{0}.service/{1}'.format(bucket, obj))
rdd.count()
Alternatively, you can use ibmos2spark. The lib is already installed on our service. Example:
import ibmos2spark
credentials = {
'endpoint': 'endpointXXXX',
'access_key': 'XXXX',
'secret_key': 'XXXX'
}
configuration_name = 'os_configs' # any string you want
cos = ibmos2spark.CloudObjectStorage(sc, credentials, configuration_name)
bucket = 'mybucket'
obj = 'mydata.tsv.gz'
rdd = sc.textFile(cos.url(obj, bucket))
rdd.count()

Stocator is on the classpath for Spark 2.0 and 2.1 kernels, but the cos scheme is not configured. You can access the config by executing the following in a Python notebook:
!cat $SPARK_CONF_DIR/core-site.xml
Look for the property fs.stocator.scheme.list. What I currently see is:
<property>
<name>fs.stocator.scheme.list</name>
<value>swift2d,swift,s3d</value>
</property>
I recommend that you raise a feature request against DSX to support the cos scheme.

It looks like cos driver is not properly initialized. Try this configuration:
hconf.set('fs.cos.impl', 'com.ibm.stocator.fs.ObjectStoreFileSystem')
hconf.set('fs.stocator.scheme.list', 'cos')
hconf.set('fs.stocator.cos.impl', 'com.ibm.stocator.fs.cos.COSAPIClient')
hconf.set('fs.stocator.cos.scheme', 'cos')
hconf.set('fs.cos.mycos.access.key', access_key)
hconf.set('fs.cos.mycos.endpoint', 'http://' + host)
hconf.set('fs.cos.mycos.secret.key', secret_key)
hconf.set('fs.cos.service.v2.signer.type', 'false')
UPDATE 1:
You also need to ensure stocator classes are on the classpath. You can use packages system by exceuting pyspark in the following way:
./bin/pyspark --packages com.ibm.stocator:stocator:1.0.24
This works with swift2d and cos scheme.
UPDATE 2:
Just follow Stocator documentation (https://github.com/CODAIT/stocator). It contains all details how to install it, what branch to use, etc.

I found the same issue, and to solve it I just changed environment:
Within IBM Watson Studio, if you start a a Jupyter notebook in an environment without a pre-configured spark cluster, than you get that error. Installing PySpark is not enough.
Instead, if you start a notebook with the Spark cluster available, you will be just fine.

You have to set .config("spark.hadoop.fs.stocator.scheme.list", "cos") along with some others fs.cos... configurations.
Here's an end-to-end snippet code example that works (tested with pyspark==2.3.2 and Python 3.7.3):
from pyspark.sql import SparkSession
stocator_jar = '/path/to/stocator-1.1.2-SNAPSHOT-IBM-SDK.jar'
cos_instance_name = '<myCosIntanceName>'
bucket_name = '<bucketName>'
s3_region = '<region>'
cos_iam_api_key = '*******'
iam_servicce_id = 'crn:v1:bluemix:public:iam-identity::<****************>'
spark_builder = (
SparkSession
.builder
.appName('test_app'))
spark_builder.config('spark.driver.extraClassPath', stocator_jar)
spark_builder.config('spark.executor.extraClassPath', stocator_jar)
spark_builder.config(f"fs.cos.{cos_instance_name}.iam.api.key", cos_iam_api_key)
spark_builder.config(f"fs.cos.{cos_instance_name}.endpoint", f"s3.{s3_region}.cloud-object-storage.appdomain.cloud")
spark_builder.config(f"fs.cos.{cos_instance_name}.iam.service.id", iam_servicce_id)
spark_builder.config("spark.hadoop.fs.stocator.scheme.list", "cos")
spark_builder.config("spark.hadoop.fs.cos.impl", "com.ibm.stocator.fs.ObjectStoreFileSystem")
spark_builder.config("fs.stocator.cos.impl", "com.ibm.stocator.fs.cos.COSAPIClient")
spark_builder.config("fs.stocator.cos.scheme", "cos")
spark_sess = spark_builder.getOrCreate()
dataset = spark_sess.range(1, 10)
dataset = dataset.withColumnRenamed('id', 'user_idx')
dataset.repartition(1).write.csv(
f'cos://{bucket_name}.{cos_instance_name}/test.csv',
mode='overwrite',
header=True)
spark_sess.stop()
print('done!')

AWS: InvalidSignature exception while adding record

InvalidSignatureException occurs when trying to add user record using Kinesis Producer library.
AWS_JAVA_SDK_VERSION=1.10.26
AWS_KINESIS_PRODUCER_VERSION=0.10.1
ERROR:
PutRecords failed: {"__type":"InvalidSignatureException","message":"The request signature we calculated does not match the signature you provided. Check your AWS Secret Access Key and signing method.
SCALA KINESIS PRODUCER CODE
private val configuration: KinesisProducerConfiguration = new KinesisProducerConfiguration
val credentialsProvider: AWSCredentialsProvider = AwsUtil.getAwsCredentials(config.awsAccessKey, config.awsSecretKey)
configuration.setCredentialsProvider(credentialsProvider)
configuration.setRecordMaxBufferedTime(config.timeLimit)
configuration.setAggregationMaxCount(1)
configuration.setRegion(config.streamRegion)
configuration.setMetricsLevel("none")
private val kinesisProducer = new KinesisProducer(configuration)
kinesisProducer.addUserRecord(streamName, key, eventBytes)`
The above code is not working. But its possible for me to add records to kinesis stream through aws cli from terminal and KinesisClient in code which is specified below.
private def createKinesisClient = {
val accessKey = config.awsAccessKey
val secretKey = config.awsSecretKey
val credentialsProvider: AWSCredentialsProvider = AwsUtil.getAwsCredentials(accessKey, secretKey)
val client = new AmazonKinesisClient(credentialsProvider)
client.setEndpoint(config.streamEndpoint)
client
}

This happens because your VM/PC/Server clock might by skewed.
If you're running ubuntu, try updating your system time:
sudo ntpdate ntp.ubuntu.com
If you are using docker-machine on Mac, you can resolve with this command:
docker-machine ssh default 'sudo ntpclient -s -h pool.ntp.org'

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Alpakka S3 library for "assume role" - scala

Related

How to specify HTTP authentication (user, password) when using an URL for libvirt_volume.source

Extract Embedded AWS Glue Connection Credentials Using Scala

SSLHandshakeException happens during file upload to AWS S3 via Alpakka

No FileSystem for scheme: cos

AWS: InvalidSignature exception while adding record

Categories

Resources