AWS EMR, Submit python pyspark script as step using terraform - pyspark

I have successfully created an EMR cluster using terraform, as per terraform documentation, it's specified on how to submit a step to EMR as a jar
https://www.terraform.io/docs/providers/aws/r/emr_cluster.html#step-1
step {
action_on_failure = "TERMINATE_CLUSTER"
name = "Setup Hadoop Debugging"
hadoop_jar_step {
jar = "command-runner.jar"
args = ["state-pusher-script"]
}
}
where as documentation for adding a pyspark script as a step is missing.
Does anyone has experience adding pyspark script as EMR step using terraform ?

A common way to do this is to copy a script from S3 and use command-runner.jar to execute the script. (I don't know that it's ideal...)
step = [
{
name = "Copy script"
action_on_failure = "CONTINUE"
hadoop_jar_step {
jar = "command-runner.jar"
args = ["aws", "s3", "cp", "s3://path/to/script.py", "/home/hadoop/"]
}
},
{
name = "Run script"
action_on_failure = "CONTINUE"
hadoop_jar_step {
jar = "command-runner.jar"
args = ["bash", "/home/hadoop/script.py"]
}
},
]

Here is a working example of using hadoop_jar_step.
step = [
{
action_on_failure = "TERMINATE_CLUSTER"
name = "Setup Hadoop Debugging"
hadoop_jar_step = [
{
jar = "command-runner.jar"
args = [
"state-pusher-script"
]
main_class = ""
properties = {}
}
]
}
]
Contains workaround for https://github.com/hashicorp/terraform-provider-aws/issues/20911
Change args to your spark-submit command.

Related

How to run gitlab-ci.yml for Gitlab Custom Runner?

I'm trying to add a custom gitlab runner. Here is a config.toml file:
concurrent = 1
check_interval = 0
log_level = "debug"
[session_server]
session_timeout = 7200
[[runners]]
name = "MyName"
url = "MyUrl"
id = 180
token = "MyToken"
token_obtained_at = 2022-09-07T11:19:22Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "custom"
shell = "pwsh"
builds_dir = "/builds"
cache_dir = "/cache"
[runners.custom]
prepare_exec = "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
prepare_args = [ "C:\\GitLab-Runner\\prepare.ps1" ]
prepare_exec_timeout = 200
run_exec = "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe"
run_args = [ "C:\\GitLab-Runner\\run.ps1" ]
It executes PowerShell scripts but I need to run script from my gitlab-ci.yml file too as it happens with other types of executor. It does not run this file and not copy this script or other files too on my PC, works only with ps1 scripts. My gitlab-ci.yml file:
stages:
- test
test:
stage: test
tags:
- my-test
script:
- .\my-test.cmd
my-test.cmd:
#echo Hello > C:\Temp\cmd-test.txt
How to make the runner to exec this script or at least copy files to my PC for manual running by powershell?

Jenkins dynamic choice parameter to read a ansible host file in github

I have an ansible host file that is stored in GitHub and was wondering if there is a way to list out all the host in jenkins with choice parameters? Right now every time I update the host file in Github I would have to manually go into each Jenkins job and update the choice parameter manually. Thanks!
I'm assuming your host file has content something similar to below.
[client-app]
client-app-preprod-01.aws-xxxx
client-app-preprod-02.aws
client-app-preprod-03.aws
client-app-preprod-04.aws
[server-app]
server-app-preprod-01.aws
server-app-preprod-02.aws
server-app-preprod-03.aws
server-app-preprod-04.aws
Option 01
You can do something like the one below. Here you can first checkout the repo and then ask for the user input. I have implemented the function getHostList() to parse the host file to filter the host entries.
pipeline {
agent any
stages {
stage('Build') {
steps {
git 'https://github.com/jglick/simple-maven-project-with-tests.git'
script {
def selectedHost = input message: 'Please select the host', ok: 'Next',
parameters: [
choice(name: 'PRODUCT', choices: getHostList("client-app","ansible/host/location"), description: 'Please select the host')]
echo "Host:::: $selectedHost"
}
}
}
}
}
def getHostList(def appName, def filePath) {
def hosts = []
def content = readFile(file: filePath)
def startCollect = false
for(def line : content.split('\n')) {
if(line.contains("["+ appName +"]")){ // This is a starting point of host entries
startCollect = true
continue
} else if(startCollect) {
if(!line.allWhitespace && !line.contains('[')){
hosts.add(line.trim())
} else {
break
}
}
}
return hosts
}
Option 2
If you want to do this without checking out the source and with Job Parameters. You can do something like the one below using the Active Choice Parameter plugin. If your repository is private, you need to figure out a way to generate an access token to access the Raw GitHub link.
properties([
parameters([
[$class: 'ChoiceParameter',
choiceType: 'PT_SINGLE_SELECT',
description: 'Select the Host',
name: 'Host',
script: [
$class: 'GroovyScript',
fallbackScript: [
classpath: [],
sandbox: false,
script:
'return [\'Could not get Host\']'
],
script: [
classpath: [],
sandbox: false,
script:
'''
def appName = "client-app"
def content = new URL ("https://raw.githubusercontent.com/xxx/sample/main/testdir/hosts").getText()
def hosts = []
def startCollect = false
for(def line : content.split("\\n")) {
if(line.contains("["+ appName +"]")){ // This is a starting point of host entries
startCollect = true
continue
} else if(startCollect) {
if(!line.allWhitespace && !line.contains("[")){
hosts.add(line.trim())
} else {
break
}
}
}
return hosts
'''
]
]
]
])
])
pipeline {
agent any
stages {
stage('Build') {
steps {
script {
echo "Host:::: ${params.Host}"
}
}
}
}
}
Update
When you are calling a private repo, you need to send a Basic Auth header with the access token. So use the following groovy script instead.
def accessToken = "ACCESS_TOKEN".bytes.encodeBase64().toString()
def get = new URL("https://raw.githubusercontent.com/xxxx/something/hosts").openConnection();
get.setRequestProperty("authorization", "Basic " + accessToken)
def content = get.getInputStream().getText()

Azure Terraform function app deployment issue

I hope somebody can help me with this issue because I don't understand what am I doing wrong.
I am trying to build an azure function app and deploy a zip package (timer trigger) to it.
I have set this code.
resource "azurerm_resource_group" "function-rg" {
location = "westeurope"
name = "resource-group"
}
data "azurerm_storage_account_sas" "sas" {
connection_string = azurerm_storage_account.sthriprdeurcsvtoscim.primary_connection_string
https_only = true
start = "2021-01-01"
expiry = "2023-12-31"
resource_types {
object = true
container = false
service = false
}
services {
blob = true
queue = false
table = false
file = false
}
permissions {
read = true
write = false
delete = false
list = false
add = false
create = false
update = false
process = false
}
}
resource "azurerm_app_service_plan" "ASP-rg-hri-prd-scim" {
location = azurerm_resource_group.function-rg.location
name = "ASP-rghriprdeurcsvtoscim"
resource_group_name = azurerm_resource_group.function-rg.name
kind = "functionapp"
maximum_elastic_worker_count = 1
per_site_scaling = false
reserved = false
sku {
capacity = 0
size = "Y1"
tier = "Dynamic"
}
}
resource "azurerm_storage_container" "deployments" {
name = "function-releases"
storage_account_name = azurerm_storage_account.sthriprdeurcsvtoscim.name
container_access_type = "private"
}
resource "azurerm_storage_blob" "appcode" {
name = "functionapp.zip"
storage_account_name = azurerm_storage_account.sthriprdeurcsvtoscim.name
storage_container_name = azurerm_storage_container.deployments.name
type = "Block"
source = "./functionapp.zip"
}
resource "azurerm_function_app" "func-hri-prd-eur-csv-to-scim" {
storage_account_name = azurerm_storage_account.sthriprdeurcsvtoscim.name
storage_account_access_key = azurerm_storage_account.sthriprdeurcsvtoscim.primary_access_key
app_service_plan_id = azurerm_app_service_plan.ASP-rg-hri-prd-scim.id
location = azurerm_resource_group.function-rg.location
name = "func-hri-prd-csv-to-scim"
resource_group_name = azurerm_resource_group.function-rg.name
app_settings = {
"WEBSITE_RUN_FROM_PACKAGE" = "https://${azurerm_storage_account.sthriprdeurcsvtoscim.name}.blob.core.windows.net/${azurerm_storage_container.deployments.name}/${azurerm_storage_blob.appcode.name}${data.azurerm_storage_account_sas.sas.sas}"
"FUNCTIONS_EXTENSION_VERSION" = "~3"
"FUNCTIONS_WORKER_RUNTIME" = "dotnet"
}
enabled = true
identity {
type = "SystemAssigned"
}
version = "~3"
enable_builtin_logging = false
}
resource "azurerm_storage_account" "sthriprdeurcsvtoscim" {
account_kind = "Storage"
account_replication_type = "LRS"
account_tier = "Standard"
allow_blob_public_access = false
enable_https_traffic_only = true
is_hns_enabled = false
location = azurerm_resource_group.function-rg.location
name = "sthriprdeurcsvtoscim"
resource_group_name = azurerm_resource_group.function-rg.name
}
Goes without saying that terraform apply work without any error. The configurations of the function app are correct and points to the right storage account. The storage account has a container with the zip file containing my azure function code.
But when I go to the function app -> Functions, I don't see any function there.
Can please somebody help me to understand what am I doing wrong in this?
The Function app is a .net3 function
When you create a function app, it isn’t set up for Functions + Terraform. It’s set up for a Visual Code + Functions deployment. We need to adjust both the package.json so that it will produce the ZIP file for us, and the .gitignore so that it ignores the Terraform build files. I use a bunch of helper NPM packages:
azure-functions-core-tools for the func command.
#ffflorian/jszip-cli to ZIP my files up.
mkdirp for creating directories.
npm-run-all and particularly the run-s command for executing things in order.
rimraf for deleting things.
Below is the code how package.json looks like
{
"name": "backend",
"version": "1.0.0",
"description": "",
"scripts": {
"func": "func",
"clean": "rimraf build",
"build:compile": "tsc",
"build:prune": "npm prune --production",
"prebuild:zip": "mkdirp --mode=0700 build",
"build:zip": "jszip-cli",
"build": "run-s clean build:compile build:zip",
"predeploy": "npm run build",
"deploy": "terraform apply"
},
"dependencies": {
},
"devDependencies": {
"azure-functions-core-tools": "^2.7.1724",
"#azure/functions": "^1.0.3",
"#ffflorian/jszip-cli": "^3.0.2",
"mkdirp": "^0.5.1",
"npm-run-all": "^4.1.5",
"rimraf": "^3.0.0",
"typescript": "^3.3.3"
}
}
npm run build will build the ZIP file.
npm run deploy will build the ZIP file and deploy it to Azure.
For complete information check Azure Function app with Terraform.

Not able to run an exe in jenkins pipeline using powershell

I am trying to execute a process which is written in c# through jenkins pipeline during the build and deployment process.
It is a simple executable which takes 3 arguments, when it gets called from jenkins pipeline using a powershell function it doesn't write any logs which are plenty within the code of this exe, also it does not show anything on the pipeline logs as to what happened to this process. Whereas the logs output is clean before and after the execution of this process i.e. "Started..." & "end" gets printed in the jenkins build log.
When i try to run the same exe on a server directly with the same powershel script it runs perfectly fine. Could you please let me know how can i determine whats going wrong here or how can i make the logs more verbose so i can figure out the root cause.
Here is the code snippet
build-utils.ps1
function disable-utility($workspace) {
#the code here fetches the executable and its supporting libraries from the artifactory location and unzip it on the build agent server.
#below is the call to the executable
Type xmlPath #this prints the whole contents of the xml file which is being used as an input to my exe.
echo "disable exe path exists : $(Test-Path ""C:\Jenkins\workspace\utils\disable.exe"")" // output is TRUE
echo "Started..."
Start-Process -NoNewWindow -Filepath "C:\Jenkins\workspace\utils\disable.exe" -ArgumentList "-f xmlPath 0" #xmlPath is a path to a xml file
echo "end."
}
jenkinsfile
library {
identifier: 'jenkins-library#0.2.14',
retriever: legacySCM{[
$class: 'GitSCM',
userRemoteConfigs: [[
credtialsId: 'BITBUCKET_RW'
url: <htps://gitRepoUrl>
]]
]}
}
def executeStep(String stepName) {
def butil = '.\\build\\build-utils.ps1'
if(fileExists(butil))
{
def status = powershell(returnStatus: true, script: "& { . '${butil}'; ${stepName}; }")
echo status
if(status != 0) {
currentBuild.Result = 'Failure'
error("$StepName failed")
}
}
else
{
error("failed to find the file")
}
}
pipeline {
agent {
docker {
image '<path to the docker image to pull a server with VS2017 build tools>'
lable '<image name>'
reuseNode true
}
}
environment {
#loading the env variables here
}
stages {
stage {
step {
executeStep("disable-utility ${env.workspace}")
}
}
}
}
Thanks a lot in advance !
Have you changed it ? go to Regedit [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System Set "EnableLUA"= 0

kamon stastsd not sending metrics when i run my scala application as a docker container

When I run scala application using 'sbt run' command it is sending kamon metrics to graphite/grafana container. Then I created a docker image for my scala application and running it as a docker container.
Now it is not sending metrics to graphite/grafana container. Both my application container and graphite/grafana container are running under same docker network.
The command I used to run grafana image is: docker run --network smart -d -p 80:80 -p 81:81 -p 2003:2003 -p 8125:8125/udp -p 8126:8126 8399049ce731
kamon configuration in application.conf is
kamon {
auto-start=true
metric {
tick-interval = 1 seconds
filters {
akka-actor {
includes = ["*/user/*"]
excludes = [ "*/system/**", "*/user/IO-**", "**/kamon/**" ]
}
akka-router {
includes = ["*/user/*"]
excludes = [ "*/system/**", "*/user/IO-**", "**/kamon/**" ]
}
akka-dispatcher {
includes = ["*/user/*"]
excludes = [ "*/system/**", "*/user/IO-**", "*kamon*",
"*/kamon/*", "**/kamon/**" ]
}
trace {
includes = [ "**" ]
excludes = [ ]enter code here
}
}
}
# needed for "[error] Exception in thread "main"
java.lang.ClassNotFoundException: local"
internal-config {
akka.actor.provider = "akka.actor.LocalActorRefProvider"
}
statsd {
hostname = "127.0.0.1"
port = 8125
# Subscription patterns used to select which metrics will be pushed
to StatsD. Note that first, metrics
# collection for your desired entities must be activated under the
kamon.metrics.filters settings.
subscriptions {
histogram = [ "**" ]
min-max-counter = [ "**" ]
gauge = [ "**" ]
counter = [ "**" ]
trace = [ "**" ]
trace-segment = [ "**" ]
akka-actor = [ "**" ]
akka-dispatcher = [ "**" ]
akka-router = [ "**" ]
system-metric = [ "**" ]
http-server = [ "**" ]
}
metric-key-generator = kamon.statsd.SimpleMetricKeyGenerator
simple-metric-key-generator {
application = "my-application"
include-hostname = true
hostname-override = none
metric-name-normalization-strategy = normalize
}
}
modules {
kamon-scala.auto-start = yes
kamon-statsd.auto-start = yes
kamon-system-metrics.auto-start = yes
}
}
your help will be very much appreciated.
It is necessary to add AspectJ weaver as Java Agent when you're starting application: -javaagent:aspectjweaver.jar
You can add the following settings in your project SBT configuration:
.settings(
retrieveManaged := true,
libraryDependencies += "org.aspectj" % "aspectjweaver" % aspectJWeaverV)
So AspectJ weaver JAR will be copied to ./lib_managed/jars/org.aspectj/aspectjweaver/aspectjweaver-[aspectJWeaverV].jar in your project root.
Then you can refer this JAR in your Dockerfile:
COPY ./lib_managed/jars/org.aspectj/aspectjweaver/aspectjweaver-*.jar /app-
workdir/aspectjweaver.jar
WORKDIR /app-workdir
CMD ["java", "-javaagent:aspectjweaver.jar", "-jar", "app.jar"]