Is there a way to pass default parameters in Spark Job - scala

While configuring spark jobs, I would like to pass JobId & RunId by default whether user pass it as a parameter or not.
The reason is that, I would like to fetch JobId & RunID from my scala application and I do not want to bother job creator to pass this as a parameter.
I tried using
dbutils.notebook.getContext.tags("jobId")
dbutils.notebook.getContext.tags("runId")
But this did not work.

You can get applicationID by:
sc.applicationId
res0: String = app-000000000000-00000
Other thing is you can get stageId and taskAttemptId using TaskContext
TaskContext.get.stageId
TaskContext.get.taskAttemptId
You can check more options using SparkStatus Tracker. May be that can help in your use case.
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkStatusTracker.html

Related

Pass string parameter to remote process in kdb

I am trying to pass a variable that is string to the ipc query. This does not work for me.
Example:
[`EDD.RDB; "?[`tab;enlist(like;`OrderId;",("string Number),");();(?:;`Actions)]"]
I am trying to query this RDB where OrderId like Number(string)
Number is a parameter but when I passed as string to the remote process, Number is not string any more. I tried to put string in front but still get the same result.
What I want to pass to remote process is this:
Number:"abc"
"?[`tab;enlist(like;`OrderId;"abc");();(?:;`Actions)]"
EDIT as you have updated your question.
It's hard to give a solid answer here as your example is lacking information.
What you have posted is not a valid IPC call in KDB+. I suspect what you may be trying to run is something like:
h(`EDD.RDB; "?[`tab;enlist(like;`OrderId;",("string Number),");();(?:;`Actions)]"])
Assuming Number is an int (e.g. Number:123) then in that case you could rewrite it as:
h(`EDD.RDB;"select distinct Actions from t where orderID like \"",string[Number],"\"")
Which is easier to read and work with. Assuming Number is defined on the client side then the above should return an answer.
If you do want to use functional form then you could try something like:
"?[`tab;enlist (like;`orderID;string[",string[Number],"]);1b;(enlist`Actions)!enlist`Actions]"
As your query string.
If Number is already a string on your process, e.g Number:"123" then you should be able to either:
h(`EDD.RDB;"select distinct Actions from t where orderID like \"",Number,"\"")
OR
h(`EDD.RDB;"?[`tab;enlist (like;`orderID;string[",Number,"]);1b;(enlist`Actions)!enlist`Actions]")
Does the IPC query have to be string? Passing parameters would be cleaner using (func;params) syntax for IPC.
handleToRdb ({[number] ?[`tab;enlist(like;`OrderId;number);();(?:;`Actions)]};"abc")

Properties in TestResultsHttpClientBase.GetTestResultDetailsForBuildAsync are null

I try to get all test results from a speficic AzureDevOps build. I need at least those information from any test result: AutomatedTestName, Outcome and ErrorMessage.
Running TestManagementHttpClient.GetTestResultDetailsForBuildAsync(this.ProjectId, this.build.Id) returns all tests, but almost all properties on the test results are null, e.g. AutomatedTestName.
TestManagementHttpClient.GetTestResultDetailsForBuildAsync(this.ProjectId, this.build.Id, shouldIncludeResults: true) is also not working.
Is there a way to load all properties?
A workaround is to call TestManagementHttpClient.GetTestResultsAsync(this.ProjectId, runId) but there the number of results are limited to 10000. This ends in paging until no more results are found. This is potentially a bottleneck and performs bad, if we have e.g. 500'000 tests....
We can load all properties via below REST API
GET https://dev.azure.com/{organization}/{project}/_apis/test/Runs/{runId}/results?api-version=5.1
With optional parameters:
GET https://dev.azure.com/{organization}/{project}/_apis/test/Runs/{runId}/results?detailsToInclude={detailsToInclude}&$skip={$skip}&$top={$top}&outcomes={outcomes}&api-version=5.1
If you have 500'000 tests, we can use the URI Parameters $skip specify the number of test results to skip from beginning.
Update1
A workaround is to call TestManagementHttpClient.GetTestResultsAsync(this.ProjectId, runId) but there the number of results are limited to 10000.
We can call TestManagementHttpClient.GetTestResultsAsync(this.ProjectId, runId, skip number) to specify the number of test results to skip from beginning.
Please refer this doc for more details.

How to get the id of the run from within a component?

I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?
You can use kfp.dsl.EXECUTION_ID_PLACEHOLDER and kfp.dsl.RUN_ID_PLACEHOLDER as arguments for your component. At runtime they will be replaced with the actual values.
I tried to do this using the Python's DSL but seems that isn't possible right now.
The only option that I found is to use the method that they used in this sample code. You basically declare a string containing {{workflow.uid}}. It will be replaced with the actual value during execution time.
You can also do this in order to get the pod name, it would be {{pod.name}}.
Since kubeflow pipeline relies on argo, you can use argo variable to get what you want.
For example,
#func_to_container_op
def dummy(run_id, run_name) -> str:
return run_id, run_name
#dsl.pipeline(
name='test_pipeline',
)
def test_pipeline():
dummy('{{workflow.labels.pipeline/runid}}', '{{workflow.annotations.pipelines.kubeflow.org/run_name}}')
You will find that the placeholders will be replaced with the correct run_id and run_name.
For more argo variables: https://github.com/argoproj/argo-workflows/blob/master/docs/variables.md
To Know what are recorded in the labels and annotation in the kubeflow pipeline run, just get the corresponding workflow from k8s.
kubectl get workflow/XXX -oyaml
create_run_from_pipeline_func which returns RunPipelineResult, and has run_id attribute
client = kfp.Client(host)
result = client.create_run_from_pipeline_func(…)
result.run_id
Your component's container should have an environment variable called HOSTNAME that is set to its unique pod name, from which you derive all necessary metadata.

How to get an option's name in a Rundeck JOB?

I have a JOB rundeck called "TEST"
I have an option called country
this option retreives a list of key, value from a remote URL as :
[
{"name":"FRANCE", "value":"FR"},
{"name":"ITALY", "value":"IT"},
{"name":"ALGERIA", "value":"DZ"}
]
I would like to use both of the name and the value in a job step.
echo ${option.country.name}
echo ${option.country.value}
But this doesn't work and I'm not able to get the name of the parameter
getting the value can be done using ${option.country}
Is there any trick to get the parameter name ???
Just for the record answer: Maybe the best approach is to create some script-step that reads the JSON file and extracts the name, also, you can use the same value name like this example (of course is not applicable for all cases).

How to pass context parameter to Query parameters value in Talend

In Talend studio, i am creating a job which involves supplying URL and query parameters to 'tRest_Client' component and facing below stated issue -
I am trying to pass context parameter data to Query parameter value as below
Context parameter -
Name : mis_id
Default : 10
Query Paramaters -
name : "query"
value : {target-rel[=context.mis_id]}
actual URL -
URL+query={target-rel[=10]}
Here i am trying to pass value 10 with 'context.mis_id'
When i run the talend job, no data is passed to the Value of Query Parameter.
Please let me know if this is correct way to pass Context parameter data.
If you need to include the context variable in a query parameter, then try remember how Java works with a static string and a variable, since talend is a Java based tool.
Say in Java, I have a variable,
String world = "World!!!";
System.out.Println("Hello "+world); // This will display as Hello World!!!
Likewise, in talend you have to use as
"{target-rel[="+context.mis_id+"]}"
Hope this would help you out.