Somehow if I specify resource in my KubernetesPodOperator, the DAG will fail. Looks like the pod is created at least it attempts to create it. The log says Event: XXXXX-e59a4be6 had an event of type Pending.
resource_config = {'limit_memory': 1, 'limit_cpu': 1, 'request_memory': 1, 'request_cpu': 1}
dagA = KubernetesPodOperator(
name="podA", namespace='my-app', task_id="task1", resources=resource_config,
...
If I don't specify the resource, it will run. The resource param is of type dictionary looking at code.
Did anyone have this issue?
I found a solution. Specifying a whole integer values does not work based on the examples here. This resource spec works:
resource_config = {'limit_memory': '1024Mi', 'limit_cpu': '500m'}
So I think that is the correct way for specifying the values.
Related
We have rest resource
/tasks/{task-type}
and only GET methods available.
GET /tasks/{task-type}
GET /tasks/{task-type}/{id}
Task entity contains meta info like created, finished, status, ref key and try counts for scheduled tasks.
Now we faced with problem, when task may contains incorrect data and its execution always failed.
Due to scheduler invoked tasks every 5 min there are a lot of errors in logs and largest try counts around 500k. The solution i found is to limit try_count to five (for example). And now we need way to manual discard try-count to zero. So i found two solutions:
1.
PATCH /tasks/{task-type}/{id}/discard-try-count - no response body
This solution look pretty simple, but violates the REST convention, because we use action(verb) in naming. But if we need to change other fields, then we will make a lot of endpoints in this style.
2a.
PATCH /tasks/{task-type}/{id}
body:
{
"tryCounts": int
}
This looks like REST want to see it and we can easy add new fields to modify, but now client can set any value for tryCount.
2b
PATCH /tasks/{task-type}/{id}
body:
{
"tryCounts": int // validate that try count can be only zero
}
Differs from the previous one by the presence of validation.
This looks like the most reliable solution. Is it really the best fit?
The non-verb convention is not a standard, you can violate it if you want to, though it can be worked around with very simple stuff, just convert the verb into a noun and you will be ok, something like:
POST /tasks/{task-type}/{id}/try-count-discarding
Another way is setting the try count to zero:
PUT /tasks/{task-type}/{id}/try-count 0
Yet another solution is combining the two, which I like the most:
PATCH /tasks/{task-type}/{id}/try-count {"op": "reset"}
Or another variant:
PATCH /tasks/{task-type}/{id} {"op": "discard-try-count"}
AKS = 1.17.9
Prometheus = 2.16.0
kube-state-metrics = 1.8.0
My use case: I want to alert when 1 of my persistent volumes are not in a "Bound" phase and only when this falls within a predefined set of namespaces.
This got me to my first attempt at joining Prometheus metrics - so, please bear with me : )
I opted to use the following to obtain the pv phase:
kube_persistentvolume_status_phase{phase="Bound",job="kube-state-metrics"}
Renders:
kube_persistentvolume_status_phase{instance="10.147.5.110:8080",job="kube-state-metrics",persistentvolume="pvc-33197ae6-d42a-777e-b8ca-efbd66a8750d",phase="Bound"} 1
kube_persistentvolume_status_phase{instance="10.147.5.110:8080",job="kube-state-metrics",persistentvolume="pvc-165d5006-erd4-481e-8acc-eed4a04a3bce",phase="Bound"} 1
This worked well, except for the fact that it does not include the namespace.
So I managed to determine the persistentvolumeclaim namespaces with this:
kube_persistentvolumeclaim_info{namespace=~"monitoring|vault"}
Renders:
kube_persistentvolumeclaim_info{instance="10.147.5.110:8080",job="kube-state-metrics",namespace="vault",persistentvolumeclaim="vault-file",storageclass="default",volumename="pvc-33197ae6-d42a-777e-b8ca-efbd66a8750d"} 1
kube_persistentvolumeclaim_info{instance="10.147.5.110:8080",job="kube-state-metrics",namespace="monitoring",persistentvolumeclaim="prometheus-prometheus-db-prometheus-prometheus-0",storageclass="default",volumename="pvc-165d5006-erd4-481e-8acc-eed4a04a3bce"} 1
So my idea was to join these sets with the matching values in the following fields:
(kube_persistentvolume_status_phase)persistentvolume
on
(kube_persistentvolumeclaim_info)volumename
BUT, if I understood it correctly you are only able to join two metrics sets on labels that match exactly (text and their values). I hence opted for the "instance" and "job" labels as these were common on both sides and matching.
kube_persistentvolume_status_phase{phase!="Bound",job="kube-state-metrics"} * on(instance,job) group_left(namespace) kube_persistentvolumeclaim_info{namespace=~"monitoring|vault"}
Renders:
Error executing query: found duplicate series for the match group {instance="10.147.5.110:8080" , job="kube-state-metrics"} on the right hand-side of the operation: [{__name__="kube_persistentvolumeclaim_info", instance="10.147.5.110:8080", job="kube-state-metrics", namespace="monitoring", persistentvolumeclaim="alertmanager-prometheusam-db-alertmanager-prometheusam-0", storageclass="default", volumename="pvc-b8406fb8-3262-7777-8da8-151815e05d75"}, {__name__="kube_persistentvolumeclaim_info", instance="10.147.5.110:8080", job="kube-state-metrics", namespace="vault", persistentvolumeclaim="vault-file", storageclass="default", volumename="pvc-33197ae6-d42a-777e-b8ca-efbd66a8750d"}];many-to-many matching not allowed: matching labels must be unique on one side
So in all fairness, the query does communicate well on what the problem is - so I attempted to solve this with the "ignoring" option - attempting to keep only the matching labels and values (instance and job) and "excluding/ignoring" the non-matching ones on both sides. This did not work either - resulting in a parsing error. Which in turn nudged me to take a step back and reassess what I am doing.
I am just a bit concerned that I am perhaps barking up the wrong tree here.
My question is: Is this at all possible and if so how? or is there perhaps another, more prudent way to achieve this?
Thanks in advance!
I'm doing some experimentation with Kubeflow Pipelines and I'm interested in retrieving the run id to save along with some metadata about the pipeline execution. Is there any way I can do so from a component like a ContainerOp?
You can use kfp.dsl.EXECUTION_ID_PLACEHOLDER and kfp.dsl.RUN_ID_PLACEHOLDER as arguments for your component. At runtime they will be replaced with the actual values.
I tried to do this using the Python's DSL but seems that isn't possible right now.
The only option that I found is to use the method that they used in this sample code. You basically declare a string containing {{workflow.uid}}. It will be replaced with the actual value during execution time.
You can also do this in order to get the pod name, it would be {{pod.name}}.
Since kubeflow pipeline relies on argo, you can use argo variable to get what you want.
For example,
#func_to_container_op
def dummy(run_id, run_name) -> str:
return run_id, run_name
#dsl.pipeline(
name='test_pipeline',
)
def test_pipeline():
dummy('{{workflow.labels.pipeline/runid}}', '{{workflow.annotations.pipelines.kubeflow.org/run_name}}')
You will find that the placeholders will be replaced with the correct run_id and run_name.
For more argo variables: https://github.com/argoproj/argo-workflows/blob/master/docs/variables.md
To Know what are recorded in the labels and annotation in the kubeflow pipeline run, just get the corresponding workflow from k8s.
kubectl get workflow/XXX -oyaml
create_run_from_pipeline_func which returns RunPipelineResult, and has run_id attribute
client = kfp.Client(host)
result = client.create_run_from_pipeline_func(…)
result.run_id
Your component's container should have an environment variable called HOSTNAME that is set to its unique pod name, from which you derive all necessary metadata.
How do I take a list of values, iterate through it to create the needed objects then pass that "list" of objects to the API to create multiple rows?
I have been successful in adding a new row with a value using the API example. In that example, two objects are created.
row_a = ss_client.models.Row()
row_b = ss_client.models.Row()
These two objects are passed in the add row function. (Forgive me if I use the wrong terms. Still new to this)
response = ss_client.Sheets.add_rows(
2331373580117892, # sheet_id
[row_a, row_b])
I have not been successful in passing an unknown amount of objects with something like this.
newRowsToCreate = []
for row in new_rows:
rowObject = ss.models.Row()
rowObject.cells.append({
'column_id': PM_columns['Row ID Master'],
'value': row
})
newRowsToCreate.append(rowObject)
# Add rows to sheet
response = ss.Sheets.add_rows(
OH_MkrSheetId, # sheet_id
newRowsToCreate)
This returns this error:
{"code": 1062, "errorCode": 1062, "message": "Invalid row location: You must
use at least 1 location specifier.",
Thank you for any help.
From the error message, it looks like you're missing the location specification for the new rows.
Each row object that you create needs to have a location value set. For example, if you want your new rows to be added to the bottom of your sheet, then you would add this attribute to your rowObject.
rowObject.toBottom=True
You can read about this location specific attribute and how it relates to the Python SDK here.
To be 100% precise here I had to set the attribute differently to make it work:
rowObject.to_bottom = True
I've found the name of the property below:
https://smartsheet-platform.github.io/smartsheet-python-sdk/smartsheet.models.html#module-smartsheet.models.row
To be 100% precise here I had to set the attribute differently to make it work:
Yep, the documentation isn't super clear about this other than in the examples, but the API uses camelCase in Javascript, but the same terms are always in snake_case in the Python API (which is, after all, the Pythonic way to do it!)
and thanks for looking!
I have an instance of YouTrack with several custom fields, some of which are String-type. I'm implementing a module to create a new issue via the YouTrack REST API's PUT request, and then updating its fields with user-submitted values by applying commands. This works great---most of the time.
I know that I can apply multiple commands to an issue at the same time by concatenating them into the query string, like so:
Type Bug Priority Critical add Fix versions 5.1 tag regression
will result in
Type: Bug
Priority: Critical
Fix versions: 5.1
in their respective fields (as well as adding the regression tag). But, if I try to do the same thing with multiple String-type custom fields, then:
Foo something Example Something else Bar P0001
results in
Foo: something Example Something else Bar P0001
Example:
Bar:
The command only applies to the first field, and the rest of the query string is treated like its String value. I can apply the command individually for each field, but is there an easier way to combine these requests?
Thanks again!
This is an expected result because all string after foo is considered a value of this field, and spaces are also valid symbols for string custom fields.
If you try to apply this command via command window in the UI, you will actually see the same result.
Such a good question.
I encountered the same issue and have spent an unhealthy amount of time in frustration.
Using the command window from the YouTrack UI I noticed it leaves trailing quotations and I was unable to find anything in the documentation which discussed finalizing or identifying the end of a string value. I was also unable to find any mention of setting string field values in the command reference, grammer documentation or examples.
For my solution I am using Python with the requests and urllib modules. - Though I expect you could turn the solution to any language.
The rest API will accept explicit strings in the POST
import requests
import urllib
from collections import OrderedDict
URL = 'http://youtrack.your.address:8000/rest/issue/{issue}/execute?'.format(issue='TEST-1234')
params = OrderedDict({
'State': 'New',
'Priority': 'Critical',
'String Field': '"Message to submit"',
'Other Details': '"Fold the toilet paper to a point when you are finished."'
})
str_cmd = ' '.join(' '.join([k, v]) for k, v in params.items())
command_url = URL + urllib.urlencode({'command':str_cmd})
result = requests.post(command_url)
# The command result:
# http://youtrack.your.address:8000/rest/issue/TEST-1234/execute?command=Priority+Critical+State+New+String+Field+%22Message+to+submit%22+Other+Details+%22Fold+the+toilet+paper+to+a+point+when+you+are+finished.%22
I'm sad to see this one go unanswered for so long. - Hope this helps!
edit:
After continuing my work, I have concluded that sending all the field
updates as a single POST is marginally better for the YouTrack
server, but requires more effort than it's worth to:
1) know all fields in the Issues which are string values
2) pre-process all the string values into string literals
3) If you were to send all your field updates as a single request and just one of them was missing, failed to set, or was an unexpected value, then the entire request will fail and you potentially lose all the other information.
I wish the YouTrack documentation had some mention or discussion of
these considerations.