EMR + Spark + KMS - save decrypted data

EMR + Spark + KMS - save decrypted data - pyspark

We are processing KMS client side encrypted data in EMR using spark. I am able to successfully process the encrypted data using the following configuration but even the aggregated data written to s3 is encrypted. Is there a way to write unencrypted data to s3 with these settings on ? If not, how can we decrypt it before loading it in RDS for reporting ?
sc._jsc.hadoopConfiguration().set("fs.s3.cse.materialsDescription.enabled", "true");
sc._jsc.hadoopConfiguration().set("fs.s3.cse.encryptionMaterialsProvider", "com.amazon.ws.emr.hadoop.fs.cse.KMSEncryptionMaterialsProvider");
sc._jsc.hadoopConfiguration().set("fs.s3.cse.kms.keyId","arn:aws:kms:us-east-1:abcd");
sc._jsc.hadoopConfiguration().set("fs.s3.cse.enabled", "true");
print('Writing to directory...' + OUTPUT_DIR)
formatted_ags.repartition(1).saveAsTextFile(OUTPUT_DIR)

Take a look at the answer for this question where described a workaround how to use different encryption configs per custom uri scheme.

To configure using different encryption key / decrypt while saving data, we can follow below strategy.
Configure a new file system and you can name it like s3X where X can be replaced by any characters.
One example configuration is as below.
{
"classification":"emrfs-site",
"properties":{
"fs.s3k.cse.kms.keyId":"arn:aws:kms:us-east-1:XXXXXXXXXXXX",
"fs.s3k.cse.enabled":"true",
"fs.s3k.cse.encryptionMaterialsProvider":"com.amazon.ws.emr.hadoop.fs.cse.KMSEncryptionMaterialsProvider",
}
},
{
"classification":"spark-defaults",
"properties":{
"spark.hadoop.fs.s3k.impl":"com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
}
And use s3k protocol while saving the output.
If we you want disable encryption we can configure like “fs.s3k.cse.enabled”:“false”

Related

Possible to see the structure or Doorkeeper AccessGrant?

I've been using Doorkeeper in a Ruby app and I have an AccessGrant (AccessCode) which is being exchanged for an AccessToken in an OIDC flow
Given I am using the default encoders/decoders, I am wondering if there is any structure to the AccessGrant(code) and if its possible to peek into its contents ?
Tried to UUDECODE, Base64/Base64URL Decode, read the source code and still cannot figure out any structure to it

Change encrypted data prefix of the Transit secret engine

The transit secrets engine returns encrypted data with a prefix:
% vault write transit/encrypt/my-key plaintext=$(base64 <<< "my secret data")
Key Value
ciphertext vault:v1:C7BqsulaJTww6+zyO+0TnjFUUdDVTQWIatlbxOtEkZbF5govTZAp8S6gjQ==
Is there any way of customazation where we can change vault:v1: >>>> CMPname:APP:
vault:v2:VHTTBb2EyyNYHsa3XiXsvXOQSLKulH+NqS4eRZdtc2TwQCxqJ7PUipvqQ==
So that it becomes:
CompnanyName:appV1:0VHTTBb2EyyNYHsa3XiXsvXOQSLKulH+NqS4eRZdtc2TwQCxqJ7PUipvqQ==

Vault has a default version template that evaluates to vault:v{{version}}. There is code that support a custom version template, but the version_template parameter is ignored when you create the key.
So as of today, this option does not exist, sorry.
This metadata is not encrypted (nor signed). I suggest you either add a prefix to it:
CompnanyName:app:vault:v1:0VHTTBb2EyyNYHsa3XiXsvXOQSLKulH+NqS4eRZdtc2TwQCxqJ7PUipvqQ=
Or replace it:
CompnanyName:app:v1:0VHTTBb2EyyNYHsa3XiXsvXOQSLKulH+NqS4eRZdtc2TwQCxqJ7PUipvqQ=
To be future proof (so that you can remove your custom code and use version_template one day), I suggest that you keep a link between my-key (the name of the key) and the prefix. As the code stands today, it is unlikely that Vault will support multiple prefixes for a single key name.

How to insert an entity into Azure Storage table using Web Activity of Azure Data Factory service

I have a table in a storage account. I would like to do a test by inserting an entity into this table using Web Activity with the guide from this link (https://learn.microsoft.com/en-us/rest/api/storageservices/insert-entity).
I also tried to create a header in the Web Activity settings with the following format for my shared key (https://learn.microsoft.com/en-us/rest/api/storageservices/authorize-with-shared-key):
Authorization="SharedKey <_AccountName>:<_Signature>"
But it seems that there is no function in the dynamic expression to make a Hash-based Message Authentication Code (HMAC) for the <_Signature>.
Could someone give me some sample or some hints? Thanks.

We have a provision for using sha2 encoding in expression builder while using Data Flows.
But while using web activity in Data factory pipelines you will have to use a workaround. Here is what I tried, Call a serverless function app based on powershell to encode the signature.
basic idea in powershell:
$ClearString = "String_to_sign"
$hasher = [System.Security.Cryptography.HashAlgorithm]::Create('sha256')
$hash = $hasher.ComputeHash([System.Text.Encoding]::UTF8.GetBytes($ClearString))
$hashString = [System.BitConverter]::ToString($hash)
$body = $hashString.Replace('-', '').ToLower()
1. Call the function app:
With body: a JSON with String_to_sign
{
"name": 'pipeline().parameters.StringToSign'
}
2. Assign function app output(HMAC) to a variable: (to later encode using base64)
#activity('Azure Function1').output.Response
3. Configure WebActivity as per your scenario:
Note: I have used sample data for demonstration purpose, please use this method modifying as per your need.
Encode HMAC and prep Authorization header by using base64 function.
Authorization: #concat('SharedKey kteststoragee:',base64(variables('sha256')))
Build Authorization header following MS doc: Table service (Shared Key authorization Use string function such as Concat to build the final string.

Ingest data into databricks using Rest API (Scala)

I need to make a call to a rest API from databricks preferably using Scala to get the data and persist the same in databricks. This is the first time i am doing this and I need help. Can any of you please walk me through step by step as to how to achieve this?. The API team has already created a service principal and has given access to the API. So the authentication needs to be done through SPN.
Thanks!

REST API is not recommended approach to ingest data into databricks.
Reason: The amount of data uploaded by single API call cannot exceed 1MB.
To upload a file that is larger than 1MB to DBFS, use the streaming API, which is a combination of create, addBlock, and close.
Here is an example of how to perform this action using Python.
import json
import base64
import requests
DOMAIN = '<databricks-instance>'
TOKEN = b'<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)
def dbfs_rpc(action, body):
""" A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
response = requests.post(
BASE_URL + action,
headers={"Authorization: Bearer %s" % TOKEN },
json=body
)
return response.json()
# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
while True:
# A block can be at most 1MB
block = f.read(1 << 20)
if not block:
break
data = base64.standard_b64encode(block)
dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})
For more details, refer "DBFS API"
Hope this helps.

the above code will work, in case if you want to upload jar file or non-ascii file instead of
dbfs_rpc("add-block", {"handle": handle, "data": data})
use
dbfs_rpc("add-block", {"handle": handle, "data": data.decode('UTF8')})
rest of the details are same.

How to use strong encryption for password field using grails with postgresql database?

I had to migrate a legacy database with clear text password to a PostgresSQL database. I've looked up what's the best way to encrypt password in a database and found the pgcrypto extension with slow algorithm. (see pgcrypto documentation for 8.4)
The migration is done for data and everything is working well.
Now I have to write a CRUD application to handle this data.
I'm wondering what's the best way to use this strong encryption with grails ?
In my model, I've used the afterInsert event to handle this :
def afterInsert() {
Compte.executeUpdate("update Compte set hashpass=crypt(hashpass, gen_salt('bf', 8)) where id = (:compteId)", [compteId: this.id])
}
I guess that I should also check if the hashpass field is modified whenever the model is saved. But before that, is there another (best) way to achieve my goal ?
Edit : I cannot use the Spring Security bcrypt plugin here. The CRUD application that I'm writing use SSO CAS so I don't need such a plugin. The CRUD application manages accounts for another application that I don't own. I just need to create a new account, modify or delete an existing one. This is very simple. The tricky part is to hack grails so that it takes into account the password field and use a specific sql to store it to a postgreSQL database.
Edit 2 :
I've come up with the following code but it doesn't work
def beforeInsert() {
hashpass = encodePassword(hashpass);
}
def encodePassword(cleartextpwd) {
// create a key generator based upon the Blowfish cipher
KeyGenerator keygenerator = KeyGenerator.getInstance("Blowfish");
// create a key
SecretKey secretkey = keygenerator.generateKey();
// create a cipher based upon Blowfish
Cipher cipher = Cipher.getInstance(ALGORITHM);
// initialise cipher to with secret key
cipher.init(Cipher.ENCRYPT_MODE, secretkey);
// get the text to encrypt
String inputText = cleartextpwd;
// encrypt message
byte[] encrypted = cipher.doFinal(inputText.getBytes("UTF-8"));
return Base64.encodeBase64String(encrypted);
}
I get a hash that is not a blowfish hash (beginning with $2a$08$ )
Edit 3 :
I've finally came up with a cleaner grails solution after reading this wiki page : grails.org/Simple+Dynamic+Password+Codec (not enough reputation to put more than 2 links so add http:// before) and the bug report jira.grails.org/browse/GRAILS-3620
Following advice from #lukelazarovic, I've also used the algorithm from the spring security plugin.
Here is my password encoder to put into grails/utils :
import grails.plugin.springsecurity.authentication.encoding.BCryptPasswordEncoder;
class BlowfishCodec {
static encode(target) {
// TODO need to put the logcount = 8 in configuration file
return new BCryptPasswordEncoder(8).encodePassword(
target, null)
}
}
I've updated my Compte model to call my password encoder before saving / updating the data :
def beforeInsert() {
hashpass = hashpass.encodeAsBlowfish();
}
def beforeUpdate() {
if(isDirty('hashpass')) {
hashpass = hashpass.encodeAsBlowfish();
}
}

The tricky part is to hack grails so that it takes into account the
password field and use a specific sql to store it to a postgreSQL
database.
Is there any particular reason to do the hashing in database?
IMHO it's better to hash the password in Grails, therefore have a code that is not database-specific and easier to read.
For hashing passwords using Blowfish algorithm using Java or Groovy see Encryption with BlowFish in Java
The resulting hash begins with algorithm specification, iteration count and salt, separated with dollar sign - '$'. So the hash may look like "$2a$08$saltCharacters" where 2a is a algorithm, 08 is iteration count, then follows salt and after salt is the hash itself.
For broader explanation see http://www.techrepublic.com/blog/australian-technology/securing-passwords-with-blowfish. Don't mind that it concerns to Blowfish in PHP, the principles applies for Java or Groovy as well.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

EMR + Spark + KMS - save decrypted data - pyspark

Take a look at the answer for this question where described a workaround how to use different encryption configs per custom uri scheme.

Related

Possible to see the structure or Doorkeeper AccessGrant?

Change encrypted data prefix of the Transit secret engine

How to insert an entity into Azure Storage table using Web Activity of Azure Data Factory service

Ingest data into databricks using Rest API (Scala)

How to use strong encryption for password field using grails with postgresql database?

Categories

Resources