automate uploading of glue script - aws-cloudformation

We are currently using cloud formation to create a glue job (via codebuild and codepipeline). The one thing we are stuck on is how to automate the code that goes into the glue job.
Our current relevant piece of the cloudformation template looks like this:
MyJob:
Type: AWS::Glue::Job
Properties:
Command:
Name: glueetl
ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
DefaultArguments:
"--job-bookmark-option": "job-bookmark-enable"
ExecutionProperty:
MaxConcurrentRuns: 2
MaxRetries: 0
Name: cf-job1
Role: !Ref MyJobRole
The problem is is the "ScriptLocation". Looks like it is required to be an S3 location. How can we automate the upload of this. The code is in a .py file in our Git repository and I assume is uploaded to the artifact repository as are of the codebuild process, but how to access it?
Would like to hear how others are doing this. Thanks!
EDIT: I was able to find a similar stack overflow post:AWS Glue automatic job creation but it the answers really don't give a solution or understand the question posed.

I've written a tool to handle the upload of stack dependencies, including CloudFormation nested templates and non-inline Lambda functions.
Currently AWS Glue was not handled since I haven't try it in any project yet. But it should be easy to expand to support Glue.
The dependencies were defined in separate config file, and a piece of code within the tool is responsible for the config. Here's the sample config:
Nested CloudFormation templates:
# DEPENDS=( <ParameterName>=<NestedTemplate> )
#
# Required: Yes if has nested template, otherwise No
# Default: None
# Syntax:
# <ParameterName>: The name of template parameter that is referred at the
# value of nested template property `TemplateURL`.
# <NestedTemplate>: A local path or a S3 URL starting with `s3://` or
# `https://` pointing to the nested template.
# The nested templates at local is going to be uploaded
# to S3 Bucket automatically during the deployment.
# Description:
# Double quote the pairs which contain whitespaces or special characters.
# Use `#` to comment out.
# ---
# Example:
# DEPENDS=(
# NestedTemplateFooURL=/path/to/nested/foo/stack.json
# NestedTemplateBarURL=/path/to/nested/bar/stack.json
# )
Lambda functions:
# LAMBDA=( <S3BucketParameterName>:<S3KeyParameterName>=<LambdaFunction> )
#
# Required: Yes if has None-inline Lambda Function, otherwise No
# Default: None
# Syntax:
# <S3BucketParameterName>: The name of template parameter that is referred
# at the value of Lambda property `Code.S3Bucket`.
# <S3KeyParameterName>: The name of template parameter that is referred
# at the value of Lambda property `Code.S3Key`.
# <LambdaFunction>: A local path or a S3 URL starting with `s3://` pointing
# to the Lambda Function.
# The Lambda Functions at local is going to be zipped and
# uploaded to S3 Bucket automatically during the deployment.
# Description:
# Double quote the pairs which contain whitespaces or special characters.
# Use `#` to comment out.
# ---
# Example:
# DEPENDS=(
# S3BucketForLambdaFoo:S3KeyForLambdaFoo=/path/to/LambdaFoo.py
# S3BucketForLambdaBar:S3KeyForLambdaBar=s3://mybucket/LambdaBar.py
# )
The tools were written in bash and come with 2 parts:
xsh: It works as a bash library framework.
xsh-lib/aws: It's a library of xsh.
The code you may need to expand is located in xsh-lib/aws/functions/cfn/deploy.sh.
The example deploy command looks like:
$ xsh aws/cfn/deploy -C /path/to/your/template-and-config-dir -t stack.json -c sample.conf
I'm considering to abstract the dependencies such as CloudFormation template, Lambda functions and Glue, into a single interface for both configs and handlers.
This will make it easier to add new dependency handlers to the deployer.

Related

Unexpected error while passing variable group variables (Azure DevOps) to YAML pipeline

I'm a newbie to both Azure DevOps and Terraform but, I'm trying to deploy a pipeline using a YAML file.
I have tried to run a terraform plan using a YAML file and passing variables (from AZ DevOps) but, I got the following error:
2021-11-24T18:39:46.4604561Z Error: "name" may only contain alphanumeric characters, dash, underscores, parentheses and periods
2021-11-24T18:39:46.4604832Z
2021-11-24T18:39:46.4605940Z on modules/aks/main.tf line 2, in resource "azurerm_resource_group" "aks-resource-group":
2021-11-24T18:39:46.4606436Z 2: name = var.resource_group_name
2021-11-24T18:39:46.4606609Z
2021-11-24T18:39:46.4606722Z
2021-11-24T18:39:46.4606818Z
2021-11-24T18:39:46.4607525Z Error: Error: Subnet: (Name "#{vnet_subnet_name}#" / Virtual Network Name "#{vnet_name}#" / Resource Group "RG-XX-XXXX-XXXXX-001") was not found
2021-11-24T18:39:46.4608006Z
2021-11-24T18:39:46.4608580Z on modules/aks/main.tf line 16, in data "azurerm_subnet" "subnet-project":
2021-11-24T18:39:46.4609335Z 16: data "azurerm_subnet" "subnet-project" {
The 'name' has the following format at the Variable group in the Azure DevOps UI:
RG-XX-XXXX-XXXXX-001
This is the snippet of where I included the replace token at the YAML file:
displayName: 'Replace Secrets'
inputs:
targetFiles: |
variables.tfvars
encoding: 'utf-8'
actionOnMissing: fail
tokenPattern: #{MyVar}#
And this is a sample of the variables I have in a variable group:
variable-group-sample
Also, I replace the terraform.tfvars file with something like this:
resource_group_name = "#{resource_group_name}#"
I have checked the name inserted at the UI several times but I feel the error is pointing to something else I cannot see.
Have anyone experienced something related to this error?
Thank you in advance!
tokenPattern: #{MyVar}#
It is looking for the pattern #{MyVar}# to replace. Not "something contained between #{ and }#, but the actual value #{MyVar}#. I'm guessing it's expecting a regular expression, but I'm not familiar with that task.
So the end result is that your #{token values}# aren't getting replaced.
Assuming you're using https://marketplace.visualstudio.com/items?itemName=qetza.replacetokens, you probably want to specify tokenPrefix: #{ and tokenSuffix: }# instead of using tokenPattern.
Now, having said that...
There is no reason for you to be using token replacement on a tfvars file. You should create different tfvars files for each environment, then pass in a tfvars file via the -var-file argument to Terraform. Secrets can be passed in on the command line via -var 'foo=bar'
Storing variables that represent application or deployment configuration in Azure DevOps (or GitHub, or any other CI system) is a big, big anti-pattern, because it's tightly coupling your deployment process to a particular platform. If you're sourcing all of your variables from Azure DevOps, you can't easily test locally or migrate to a different CI/CD provider like GitHub Actions in the future.
For values that shouldn't be in source control, such a secrets, you should use a secret provider like Azure KeyVault and integrate it with your application (or, in this case, use a data resource in Terraform to pull the necessary secrets automatically at deployment time).

How to set a variable from another yaml file in azure-pipeline.yml

I have an environment.yml shown as follow, I would like to read out the content of the name variable (core-force) and set it as a value of the global variable in my azure-pipeline.yamal file how can I do it?
name: core-force
channels:
- conda-forge
dependencies:
- click
- Sphinx
- sphinx_rtd_theme
- numpy
- pylint
- azure-cosmos
- python=3
- flask
- pytest
- shapely
in my azure-pipeline.yml file I would like to have something like
variables:
tag: get the value of the name from the environment.yml aka 'core-force'
Please check this example:
File: vars.yml
variables:
favoriteVeggie: 'brussels sprouts'
File: azure-pipelines.yml
variables:
- template: vars.yml # Template reference
steps:
- script: echo My favorite vegetable is ${{ variables.favoriteVeggie }}.
Please note, that variables are simple string and if you want to use list you may need do some workaraund in powershell in place where you want to use value from that list.
If you don't want to use template functionality as it is shown above you need to do these:
create a separate job/stage
define step there to read environment.yml file and set variables using REST API or Azure CLI
create another job/stage and move you current build defitnion into there
I found this topic on developer community where you can read:
Yaml variables have always been string: string mappings. The doc appears to be currently correct, though we may have had a bug when last you visited.
We are preparing to release a feature in the near future to allow you to pass more complex structures. Stay tuned!
But I don't have more info bout this.
Global variables should be stored in a separate template file. This file ideally would be in a separate repo where other repos can refer to this.
Here is another answer for this

How can I get awx to use azure_rm.yml instead of the ini version?

I need to use the "plugin/azure_rm.yml" version of azure_rm instead of the older/deprecated "script/azure_rm.ini" to gather dynamic inventory in Azure, in particular because we need VMs from scalesets to be included in inventory. How can I do this?
You can use an inventory script, which (again) calls ansible-inventory, and have it create the config file via a heredoc:
#!/usr/bin/env bash
cat > azure_rm.yml <<HEREDOC
---
plugin: azure_rm
include_vmss_resource_groups:
- '*'
hostvar_expressions:
ansible_host: private_ipv4_addresses | first
plain_host_names: true
keyed_groups:
# places each host in a group named 'tag_(tag name)_(tag value)' for each tag on a VM.
- prefix: tag
key: tags
# places each host in a group named 'azure_loc_(location name)', depending on the VM's location
- prefix: azure_loc
key: location
# group by platform (to copy prefix from ec2.py), eg: platform_windows
- prefix: platform
key: os_disk.operating_system_type
HEREDOC
ansible-inventory -i azure_rm.yml --list
rm azure_rm.yml
So it's literally having ansible-inventory call ansible-inventory, but with a different argument. Note that in order to get the inventory for the correct subscription, you have to create a copy the credential used, with the desired subscription id; it doesn't appear that you can override AZURE_SUBSCRIPTION_ID via the yml environment param.

How to associate hash-named .wsp files to my tagged graphite metrics?

I used graphite tagged metrics over grafana and whisper, but http://graphite/tags/delSeries removes something but not .wsp files.
And untagged metrics creates .wsp files in whisper data folder with human-readable names, but tagged metrics creates only hash-named folders and .wsp files in _tagged directory.
Like so:
/whisper
/data
/Players
registrations.wsp
today_registrations.wsp
/Gaming
playing_count.wsp
/_tagged
/f58
/010
f58010d4cef67599a31f4daaab4a53c4d7fd85a9faea546282d2058c40c7e7b9.wsp
/f56
/031
f56031052aec89dc9cc38e44dbe71b2eb08fb513a3e60d515eb1dc23f5b929d1.wsp
How to know .wsp file associated with my tagged metric?
I'm just running into that problem as well, how to map the actual path/tag metric to its corresponding hashed wsp file.
I don't think you can compute the actual metric name from the hash, but you can do the other way around, by using graphite's encoding methods.
I've quickly written a python script just for lab purpose:
- It can take several metric names in parameters and returns a mapping
Just log into your graphite host and create a python script in /opt/graphite/webapp/graphite/tags
#!/opt/graphite/bin/python3
import sys
from utils import TaggedSeries
for line in sys.stdin:
paths = line.split()
for path in paths:
# Normalize first
parsed = TaggedSeries.parse(path)
print( path + " -> /opt/graphite/storage/whisper/" + TaggedSeries.encode(parsed.path,'/',True) + ".wsp")
You can then pipe a list of metrics:
# echo "users.count;server=s1" |python mapper.py
users.count;server=s1 -> /opt/graphite/storage/whisper/_tagged/b6c/c91/b6cc916d608e4b145b318669606e79118cc41d316f96735dd43621db4fd2bcaf.wsp
You can also get all your tagged metrics and generate a file that you can later cat into the script. In this example i get all metrics associated associated with the tag 'server':
# curl -s "http://localhost/tags/findSeries?expr=server=~." | sed s/"\", \""/\\n/g > my_metrics
Then cat your metrics:
# cat my_metrics | python mapper.py
That's a starting point. From there you can easily do some simple scripting for deleting wsp files, like the ones not updated since a month by example.
graphite

How can I hide skipped tasks output in Ansible

I have Ansible role, for example
---
- name: Deploy app1
include: deploy-app1.yml
when: 'deploy_project == "{{app1}}"'
- name: Deploy app2
include: deploy-app2.yml
when: 'deploy_project == "{{app2}}"'
But I deploy only one app in one role call. When I deploy several apps, I call role several times. But every time there is a lot of skipped tasks output (from tasks which do not pass condition), which I do not want to see. How can I avoid it?
I'm assuming you don't want to see the skipped tasks in the output while running Ansible.
Set this to false in the ansible.cfg file.
display_skipped_hosts = false
Note. It will still output the name of the task although it will not display "skipped" anymore.
UPDATE: by the way you need to make sure ansible.cfg is in the current working directory.
Taken from the ansible.cfg file.
ansible will read ANSIBLE_CONFIG,
ansible.cfg in the current working directory, .ansible.cfg in
the home directory or /etc/ansible/ansible.cfg, whichever it
finds first.
So ensure you are setting display_skipped_hosts = false in the right ansible.cfg file.
Let me know how you go
Since ansible 2.4, a callback plugin name full_skip was added to suppress the skipping of task names and skipping keyword in the ansible output. You can try the below ansible configuration:
[defaults]
stdout_callback = full_skip
Ansible allows you to control its output by using custom callbacks.
In this case you can simply use the skippy callback which will not output anything on a skipped task.
That said, skippy is now deprecated and will be removed in ansible v2.11.
If you don't mind losing colours you can elide the skipped tasks by piping the output through sed:
ansible-playbook whatever.yml | sed -nr '/^TASK/{h;n;/^skipping:/{n;b};H;x};p'
If you are using roles, you can use when to cancel the include in main.yml
# roles/myrole/tasks/main.yml
- include: somefile.yml
when: somevar is defined
# roles/myrole/tasks/somefile.yml
- name: this task will only run (and be seen in the output) if somevar is defined
debug:
msg: "Hello World"