EMR cluster bootstrap + setting environment variables cluster-wise - pyspark

I am trying to create an EMR cluster (through the command line) and give it some bootstrap actions and configurations file.
The aim is setting some SPARK/Yarn vars, and some other environment variables that should be used cluster-wise (so these env vars should be available on the master AND the slaves).
I am giving it a configurations file that looks like this:
[
{
"Classification": "yarn-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"appMasterEnv.SOME_VAR": "123",
"nodemanager.vmem-check-enabled": "false",
"executor.memoryOverhead": "5g"
},
"Configurations": [
]
}
]
},
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"appMasterEnv.SOME_VAR": "123",
"PYSPARK_DRIVER_PYTHON": "python36",
"PYSPARK_PYTHON": "python36",
"driver.memoryOverhead": "14g",
"driver.memory": "14g",
"executor.memory": "14g"
},
"Configurations": [
]
}
]
}
]
However when I try to add some steps to the cluster, the step fails claiming it does not know about the environment variable SOME_VAR.
Traceback (most recent call last):
File "..", line 9, in <module>.
..
raise EnvironmentError
OSError
(The line number is where I am trying to use the environment var SOME_VAR)
Am I doing it the right way both for SOME_VAR and the other Spark/Yarn vars?
Thank you

Remove appMasterEnv in front of appMasterEnv.SOME_VAR, as user lenin suggested.
Use classification yarn-env to pass environment variables to the worker nodes.
Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.

Related

Pass environment/input argument to command argument in step function

I'm trying to setup a step function with an EKS run job. The job kicks off a pod in an EKS cluster and execute commands. As a start, I want the command to be echo $S3_BUCKET $S3_KEY, where both $S3_BUCKET and $S3_KEY are environment variables passed in from the step function input. Here is the container spec:
"containers": [
{
"name": "my-container-spec",
"image": "****.dkr.ecr.****.amazonaws.com/****:latest",
"command": [
"echo"
],
"args": [
"$S3_BUCKET", "$S3_KEY"
],
"env": [
{
"name": "S3_BUCKET",
"value.$": "$.s3_bucket"
},
{
"name": "S3_KEY",
"value.$": "$.s3_key"
}
]
}
],
"restartPolicy": "Never"
Unfortunately, after the job is executed, the command only echo the raw test $S3_BUCKET $S3_KEY instead of the passed in value.
So the question here is how I should pass in an environment variable as an args. The environment variable doesn't have to be passed in, it could be other inherited variables.
This will do the trick: args: ["$(S3_BUCKET), $(S3_KEY)"]

Use command inside a VSCode configuration

As per the documentation given here, I wish to add a text prompt box when I start my debug configuration. My launch.json file is as follows -
{ "version": "2.0.0",
"configurations": [
{
"name": "Docker Attach my container",
"type": "coreclr",
"request": "attach",
"processId": "${command:pickRemoteProcess}",
"pipeTransport": {
"pipeProgram": "docker",
"pipeArgs": [ "exec", "-i", "${input:containerName}" ],
"debuggerPath": "/vsdbg/vsdbg",
"pipeCwd": "${workspaceRoot}",
"quoteArgs": false
}
}
],
"inputs": [
{
"id": "containerName",
"type": "promptString",
"description": "Please enter container name",
"default": "my-container"
}
]
}
However with this VSCode does not give the prompt for me to enter container name. Any ideas why this would be the case?
Also further question, ideally I wish to execute a shell script that can run docker ps + some grep to filter out the correct container name automatically. So if that can be done and then passed to this configuration as an argument, that would be even ideal.
For the second part you can use the extension Command Variable to use the content of a file as a variable of via a Key-Value pair.
Write a shell script that does your docker ps and grep that produces the result in a file in a preLaunchTask.
Use the command extension.commandvariable.file.content in an ${input:xxxx} variable and use the extension to read the content of the file to be used in the launch command.

Packer - Powershell pass variables

Currently we are deploying images with packer (In a build pipeline which is located in Azure DevOps) within our AWS domain with success. Now we want to take this a step further and we're trying to configure a couple of user for future Ansible maintenance. So we're written a script and tried it as an inline Powershell script but both of the options do not seem to pick up the variable which is set in the variable group in Azure DevOps, all the other variables are being used with success. My code is as follows:
{
"variables": {
"build_version": "{{isotime \"2006.01.02.150405\"}}",
"aws_access_key": "$(aws_access_key)",
"aws_secret_key": "$(aws_secret_key)",
"region": "$(region)",
"vpc_id": "$(vpc_id)",
"subnet_id": "$(subnet_id)",
"security_group_id": "$(security_group_id)",
"VagrantUserpassword": "$(VagrantUserPassword)"
},
"builders": [
{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "{{user `region`}}",
"vpc_id": "{{user `vpc_id`}}",
"subnet_id": "{{user `subnet_id`}}",
"security_group_id": "{{user `security_group_id`}}",
"source_ami_filter": {
"filters": {
"name": "Windows_Server-2016-English-Full-Base-*",
"root-device-type": "ebs",
"virtualization-type": "hvm"
},
"most_recent": true,
"owners": [
"801119661308"
]
},
"ami_name": "WIN2016-CUSTOM-{{user `build_version`}}",
"instance_type": "t3.xlarge",
"user_data_file": "userdata.ps1",
"associate_public_ip_address": true,
"communicator": "winrm",
"winrm_username": "Administrator",
"winrm_timeout": "15m",
"winrm_use_ssl": true,
"winrm_insecure": true,
"ssh_interface": "private_ip"
}
],
"provisioners": [
{
"type": "powershell",
"environment_vars": ["VagrantUserPassword={{user `VagrantUserPassword`}}"],
"inline": [
"Install-WindowsFeature web-server,web-webserver,web-http-logging,web-stat-compression,web-dyn-compression,web-asp-net,web-mgmt-console,web-asp-net45",
"New-LocalUser -UserName 'Vagrant' -Description 'User is responsible for Ansible connection.' -Password '$(VagrantUserPassword)'"
]
},
{
"type": "powershell",
"environment_vars": ["VagrantUserPassword={{user `VagrantUserPassword`}}"],
"scripts": [
"scripts/DisableUAC.ps1",
"scripts/iiscompression.ps1",
"scripts/ChocoPackages.ps1",
"scripts/PrepareAnsibleUser.ps1"
]
},
{
"type": "windows-restart",
"restart_check_command": "powershell -command \"& {Write-Output 'Machine restarted.'}\""
},
{
"type": "powershell",
"inline": [
"C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
"C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
]
}
]
}
The "VagrantUserpassword": "$(VagrantUserPassword)" is what is not working, we've tried multiple options but none of them seem to be working.
Any idea's?
Kind regards,
Rick.
Based on my test, the pipeline variables indeed couldn't pass to the powershell environment variable.
Workaround:
You could try to use the Replace Token task to pass the pipeline value to Json file.
Here are the steps:
1.Set the value in Json file.
{
"variables": {
....
"VagrantUserpassword": "#{VagrantUserPassword}#"
},
Use Replace Token task before the script task.
Set the value in Pipeline variables.
Then the value could be set successfully.
On the other hand, I also find some issues in your sample file.
"environment_vars": ["VagrantUserPassword={{user VagrantUserPassword}}"], The VagrantUserPassword need to be replaced with VagrantUserpassword(["VagrantUserPassword={{user VagrantUserpassword}}"]).
Note: This is case sensitive.
You need to use $Env:VagrantUserPassword to replace the $(VagrantUserPassword)
For example:
"inline": [
"Write-Host \"Automatically generated aws password is: $Env:VagrantUserPassword\"",
"Write-Host \"Automatically generated aws password is: $Env:VAR5\""
]

How to run a PowerShell script during Azure VM deployment with ARM template?

I want to deploy a VM in azure using Azure Resource Manager (ARM), and then run a PowerShell script inside the VM post deployment to configure it.
I can do this fine with something like this: https://github.com/Azure/azure-quickstart-templates/tree/master/201-vm-vsts-agent
However that template grabs the PowerShell script from GitHub. As part of my deployment I want to upload the script to Azure Storage, and then have the VM get the script from Azure storage and run it. How can I do that part with regards to dependencies on the PowerShell script, because it has to exist in Azure Storage somewhere before being executed.
I currently have this to install a VSTS Agent as part of a deployment, but the script is downloaded from GitHub, I don't want to do that, I want the installation script of the VSTS Agent to be part of my ARM Project.
{
"name": "vsts-build-agents",
"type": "extensions",
"location": "[parameters('location')]",
"apiVersion": "2017-12-01",
"dependsOn": [
"vsts-build-vm"
],
"tags": {
"displayName": "VstsInstallScript"
},
"properties": {
"publisher": "Microsoft.Compute",
"type": "CustomScriptExtension",
"typeHandlerVersion": "1.9",
"settings": {
"fileUris": [
"[concat(parameters('_artifactsLocation'), '/', variables('powerShell').folder, '/', variables('powerShell').script, parameters('_artifactsLocationSasToken'))]"
]
},
"protectedSettings": {
"commandToExecute": "[concat('powershell.exe -ExecutionPolicy Unrestricted -Command \"& {', './', variables('powerShell').script, ' ', variables('powerShell').buildParameters, '}\"')]"
}
}
}
I guess my question is really about how to set _azurestoragelocation to an azure storage location where the script has just been uploaded as part of the deployment.
chicken\egg problem. you cannot upload to azure storage with arm template, you need to use script to upload to azure storage, but if you have that script on vm to upload it you dont really need to upload it.
that being said, why dont you use VSTS agent extension?
{
"name": "xxx",
"apiVersion": "2015-01-01",
"type": "Microsoft.Resources/deployments",
"properties": {
"mode": "Incremental",
"templateLink": {
"uri": "https://gallery.azure.com/artifact/20161101/microsoft.vsts-agent-windows-arm.1.0.0/Artifacts/MainTemplate.json"
},
"parameters": {
"vmName": {
"value": "xxx"
},
"location": {
"value": "xxx"
},
"VSTSAccountName": {
"value": "xxx"
},
"TeamProject": {
"value": "xxx"
},
"DeploymentGroup": {
"value": "Default"
},
"AgentName": {
"value": "xxx"
},
"PATToken": {
"value": "xxx"
}
}
}
},
Do you mean how to set _artifactsLocation as in the quickstart sample? If so you have 2 options (or 3 depending)
1) use the script in the QS repo, the defaultValue for the _artifactsLocation param will set that for you...
2) if you want to customize, from your local copy of the sample, just use the Deploy-AzureResourceGroup.ps1 in the repo and it will stage and set the value for you accordingly (when you use the -UploadArtifacts switch)
3) stage the PS1 somewhere yourself and manually set the values of _artifactsLocation and _artifactsLocationSasToken
You can also deploy from gallery.azure.com, but that will force you to use the script that is stored in the galley (same as using the defaults in GitHub)
That help?

How to dynamically name an ECS cluster with cloudformation?

Its easy to create the cluster MyCluster with a hardcoded name:
"MyCluster": {
"Type": "AWS::ECS::Cluster"
}
However, I'm wanting to have a dynamic name but also reference the named resource. Something like this where the cluster name would be the stack name:
"NamedReferenceButNotClusterName": {
"Type": "AWS::ECS::Cluster",
"Properties": {
"Name": {"Ref": "AWS::StackName"} <-- Name property isnt allowed
}
},
"ecsService": {
"Type": "AWS::ECS::Service",
"DependsOn": [
{"Ref": "NamedReferenceButNotClusterName"} <-- not sure if I can even do this
],
"Properties": {
"Cluster": {
"Ref": "NamedReferenceButNotClusterName" <-- I really want this part
},
"DesiredCount": 2,
"TaskDefinition": {
"Ref": "EcsTask"
}
}
}
Is there any way to do this?
It's not possible with AWS cloud formation.
"MyCluster": {
"Type": "AWS::ECS::Cluster"
}
The above cloudformation script will generate a ECS cluster with name format <StackName>-MyCluster-<RandomSequence>.
The stackname is provided as input at the time of execution of the cloudformation script. The random sequence is generated by cloudformation and cannot be deterministic.
At this point the best bet to create a cluster with desired naming convention will be using aws cli or a small program using aws sdk.