Pod "no2-pipeline-x5kpd-2954674781" is invalid: spec.volumes[3].name: Duplicate value: "no2-pvc" - kubernetes

Hi I am trying to run a Kubeflow pipeline.
Two steps will run in parallel and dump data to two different folders of PVC, then the third component will collect data from those to folders and merge them together and dump the merged data to another PVC folder.
Here are my pipeline codes:
vop = dsl.VolumeOp(
name='no2-pvc',
resource_name = "no2-pvc",
size="100Gi",
modes = dsl.VOLUME_MODE_RWO
)
##LOADING POSITIVE DATA##
load_positive_data = dsl.ContainerOp(
name='load_positive_data',
image=load_positive_data_image,
command="python",
arguments=[
"/app/load_positive_data.py",
],
pvolumes={"/mnt/positive/": vop.volume}).apply(gcp.use_gcp_secret("user-gcp-sa"))
##LOADING NEGATIVE DATA##
load_negative_data = dsl.ContainerOp(
name='load_negative_data',
image=load_negative_data_image,
command="python",
arguments=[
"/app/load_negative_data.py",
],
pvolumes={"/mnt/negative/": vop.volume}).apply(gcp.use_gcp_secret("user-gcp-sa"))
##MERGING POSITIVE AND NEGATIVE DATA##
marge_pos_neg_data = dsl.ContainerOp(
name='marge_pos_neg_data',
image=marged_data_image,
command="python",
arguments=[
"/app/merge_neg_pos.py"
],
pvolumes = {"/mnt/positive/": load_negative_data.pvolume, "/mnt/negative/": load_positive_data.pvolume}
#volumes={'/mnt': vop.after(load_negative_data, load_positive_data)}
).apply(gcp.use_gcp_secret("user-gcp-sa")).after(load_positive_data, load_negative_data)
##PROCESSING MARGED DATA##
process_marged_data = dsl.ContainerOp(
name='process_data',
image=perpare_merged_data_image,
command="python",
arguments=[
"/app/prepare_all_dataset.py"
],
pvolumes = {"/mnt/pos_neg": marge_pos_neg_data.pvolume}
).apply(gcp.use_gcp_secret("user-gcp-sa")).after(marge_pos_neg_data)
load-positive-data and load-negative-data are working fine but the marge-pos-neg-data step is giving the following error:
This step is in Error state with this message:
task 'no2-pipeline-x5kpd.marge-pos-neg-data'
errored: Pod "no2-pipeline-x5kpd-2954674781" is invalid:
spec.volumes[3].name: Duplicate value: "no2-pvc"
Hoping for your help to resolve the issue.

pvolumes={"/mnt/positive/": vop.volume}) and pvolumes={"/mnt/negative/": vop.volume}) was creating two separate pvc's.

Related

mock outputs in Terragrunt dependency

I want to use Terragrunt to deploy this example: https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/main/examples/complete-kubernetes-addons/main.tf
So far, I was able to create the VPC/EKS resource without a problem, I separated each module into a different module directory, and everything worked as expected.
When I tried to do the same for the Kubernetes-addons module, I faced an issue with the data source trying to call to the cluster and failing since the cluster wasn't created at this point.
Here's my terragrunt.hcl which I'm trying to execute for this specific module:
...
terraform {
source = "git::git#github.com:aws-ia/terraform-aws-eks-blueprints.git//modules/kubernetes-addons?ref=v4.6.1"
}
locals {
# Extract needed variables for reuse
cluster_version = "${include.envcommon.locals.cluster_version}"
name = "${include.envcommon.locals.name}"
}
dependency "eks" {
config_path = "../eks"
mock_outputs = {
eks_cluster_endpoint = "https://000000000000.gr7.eu-west-3.eks.amazonaws.com"
eks_oidc_provider = "something"
eks_cluster_id = "something"
}
}
inputs = {
eks_cluster_id = dependency.eks.outputs.cluster_id
eks_cluster_endpoint = dependency.eks.outputs.eks_cluster_endpoint
eks_oidc_provider = dependency.eks.outputs.eks_oidc_provider
eks_cluster_version = local.cluster_version
...
}
The error that I'm getting here:
`
INFO[0035]
Error: error reading EKS Cluster (something): couldn't find resource
with data.aws_eks_cluster.eks_cluster,
on data.tf line 7, in data "aws_eks_cluster" "eks_cluster":
7: data "aws_eks_cluster" "eks_cluster" {
`
The kubernetes-addons module is deploying addons into an existing Kubernetes cluster. If you don't have a cluster running (apparently you don't have one when you're mocking the cluster_id variable), then you get the error of not having the aws_eks_cluster data source.
You need to create the K8s cluster first, before you can start deploying the addons.

Referencing a loop object

i am currently checking out tanka + jsonnet. But evertime i think i understand it... sth. new irritates me. Can somebody help me understand how to do a loop-reference? (Or general better solution?)
Trying to create multiple deployments with a corresponding configmapVolumeMount and i am not sure how to reference to the according configmap object here?
(using a configVolumeMount it works since it refers to the name, not the object).
deployment: [
deploy.new(
name='demo-' + instance.name,
],
)
+ deploy.configMapVolumeMount('config-' + instance.name, '/config.yml', k.core.v1.volumeMount.withSubPath('config.yml'))
for instance in $._config.demo.instances
],
configMap: [
configMap.new('config-' + instance.name, {
'config.yml': (importstr 'files/config.yml') % {
name: instance.name,
....
},
}),
for instance in $._config.demo.instances
]
regards
Great to read that you're making progress with tanka, it's an awesome tool (once you learned how to ride it heh).
Find below a possible answer, see inline comments in the code, in particular how we ab-use tanka layout flexibility, to "populate" deploys: [...] array with jsonnet objects containing each paired deploy+configMap.
config.jsonnet
{
demo: {
instances: ['foo', 'bar'],
image: 'nginx', // just as example
},
}
main.jsonnet
local config = import 'config.jsonnet';
local k = import 'github.com/grafana/jsonnet-libs/ksonnet-util/kausal.libsonnet';
{
local deployment = k.apps.v1.deployment,
local configMap = k.core.v1.configMap,
_config:: import 'config.jsonnet',
// my_deploy(name) will return name-d deploy+configMap object
my_deploy(name):: {
local this = self,
deployment:
deployment.new(
name='deploy-%s' % name,
replicas=1,
containers=[
k.core.v1.container.new('demo-%s' % name, $._config.demo.image),
],
)
+ deployment.configMapVolumeMount(
this.configMap,
'/config.yml',
k.core.v1.volumeMount.withSubPath('config.yml')
),
configMap:
configMap.new('config-%s' % name)
+ configMap.withData({
// NB: replacing `importstr 'files/config.yml';` by
// a simple YAML multi-line string, just for the sake of having
// a simple yet complete/usable example.
'config.yml': |||
name: %(name)s
other: value
||| % { name: name }, //
}),
},
// Tanka is pretty flexible with the "layout" of the Kubernetes objects
// in the Environment (can be arrays, objects, etc), below using an array
// for simplicity (built via a loop/comprehension)
deploys: [$.my_deploy(name) for name in $._config.demo.instances],
}
output
$ tk init
[...]
## NOTE: using https://kind.sigs.k8s.io/ local Kubernetes cluster
$ tk env set --server-from-context kind-kind environments/default
[... save main.jsonnet, config.jsonnet to ./environments/default/]
$ tk apply --dry-run=server environments/default
[...]
configmap/config-bar created (server dry run)
configmap/config-foo created (server dry run)
deployment.apps/deploy-bar created (server dry run)
deployment.apps/deploy-foo created (server dry run)

Error: "serving_default not found in signature def" when testing prediction

I followed this tutorial and came to the point where I can test a prediction using the following code:
{
"instances": [
{"csv_row": "44, Private, 160323, Some-college, 10, Married-civ-spouse, Machine-op-inspct, Husband, Black, Male, 7688, 0, 40, United-States", "key": "dummy-key"}
]
}
However, I am getting the following error:
{
"error": "{ \"error\": \"Serving signature name: \\\"serving_default\\\" not found in signature def\" }"
}
I presume the input format doesn't represent the expected input, but am not entirely should what should be expected.
Any ideas as to what is causing the example code to throw this error?
I finally figured it out: I loaded the tensorflow model in jupyter notebook and printed out the signatures:
new_model = tf.keras.models.load_model('modelPath')
print(list(new_model.signatures.keys()))
the result was: [u'predict']
so the command i used to get a prediction is:
georg#Georgs-MBP ~ % gcloud ai-platform predict
--model $MODEL_NAME
--version "v1"
--json-instances sample_input.json
--format "value(predictions[0].classes[0])"
--signature-name "predict"
result:
Using endpoint [https://europe-west3-ml.googleapis.com/]
<=50K
To add signature serving_default:
import tensorflow as tf
m = tf.saved_model.load("tf2-preview_inception_v3_classification_4")
print(m.signatures) # _SignatureMap({}) - Empty
t_spec = tf.TensorSpec([None,None,None,3], tf.float32)
c_func = m.__call__.get_concrete_function(inputs=t_spec)
signatures = {'serving_default': c_func}
tf.saved_model.save(m, 'tf2-preview_inception_v3_classification_5', signatures=signatures)
# Test new model
m5 = tf.saved_model.load("tf2-preview_inception_v3_classification_5")
print(m5.signatures) # _SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(*, inputs) at 0x17316DC50>})

Probable causes for idempotent error by terraform for infra generation

We are using terraform to launch ECS containers in AWS infra using custom task definition.
As we didn't require full infra to be launched every-time, a part only for launching ECS container was segregated.
The launch was happening correctly, till ECS launch code was segregated, then the ECS service launch started giving an error indicating Idempotent issue.
│ Error: error creating target service: error waiting for ECS service (sandbox) creation: InvalidParameterException: Creation of service was not idempotent.
│
│ with aws_ecs_service.ecs_service_target,
│ on aws_infra_ecs.tf line 100, in resource "aws_ecs_target" "ecs_service_target":
│ 100: resource "aws_ecs_target" "ecs_service_target" {
│
ECS service is defined somewhat like below:
resource "aws_ecs_service" "ecs_service_target" {
desired_count = 1
name = "target"
launch_type = "FARGATE"
cluster = data.aws_ecs_cluster.cluster_target.id
enable_ecs_managed_tasks = true
task_definition = aws_ecs_task_definition.target_taskdef.arn
platform_version = "1.4.0"
...
load_balancer {
...
target_group_arn = data.aws_lb_target_group.aws_target.arn
}
...
network_configuration {
...
security_groups = [ data.aws_security_group.target_sg.id ]
subnets = [ "subet-5767c3c2" ] # A dynamic subnet reference id is used here
}
depends_upon = [
var.second_service_name,
aws_ecs_task_definition.target_taskdef,
data.aws_efs_access_point.target_ap
]
...
}
I was expecting the problems to be one of following kind:
Subnet selected may be different due to variable based selection
Use of indirect data references (rather than direct resource reference) may cause issue
task definition JSON encoding issue
What might be other causes for such a problem.

Bazel k8s_object - Unable to publish images

I have this BUILD file:
package(default_visibility = ["//visibility:public"])
load("#npm_bazel_typescript//:index.bzl", "ts_library")
ts_library(
name = "lib",
srcs = glob(
include = ["**/*.ts"],
exclude = ["**/*.spec.ts"]
),
deps = [
"//packages/enums/src:lib",
"//packages/hello/src:lib",
"#npm//faker",
"#npm//#types/faker",
"#npm//express",
"#npm//#types/express",
],
)
load("#io_bazel_rules_docker//nodejs:image.bzl", "nodejs_image")
nodejs_image(
name = "server",
data = [":lib"],
entry_point = ":index.ts",
)
load("#io_bazel_rules_docker//container:container.bzl", "container_push")
container_push(
name = "push_server",
image = ":server",
format = "Docker",
registry = "gcr.io",
repository = "learning-bazel-monorepo/server",
tag = "dev",
)
load("#io_bazel_rules_k8s//k8s:object.bzl", "k8s_object")
k8s_object(
name = "k8s_deploy",
kind = "deployment",
namespace = "default",
template = ":server.yaml",
images = {
"deploy_server:do_not_delete": ":server"
},
)
But when running the k8s_deploy rule I get this error:
INFO: Analyzed target //services/server/src:k8s_deploy (1 packages loaded, 7 targets configured).
INFO: Found 1 target...
Target //services/server/src:k8s_deploy up-to-date:
bazel-bin/services/server/src/k8s_deploy.substituted.yaml
bazel-bin/services/server/src/k8s_deploy
INFO: Elapsed time: 0.276s, Critical Path: 0.01s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
2019/12/22 07:45:14 Unable to publish images: unable to publish image deploy_server:do_not_delete
The lib, server and push_server rules work fine. So I don't know what's the issue as there is no specific error message.
A snippet out of my server.yaml file:
spec:
containers:
- name: server
image: deploy_server:do_not_delete
You can try it yourself by running bazel run //services/server/src:k8s_deploy on this repo: https://github.com/flolude/minimal-bazel-monorepo/tree/de898eb1bb4edf0e0b1b99c290ff7ab57db81988
Have you pushed images using this syntax before?
I'm used to using the full repository tag for both the server.yaml and the k8s_object images.
So, instead of just "deploy_server:do_not_delete", try "gcr.io/learning-bazel-monorepo/deploy_server:do_not_delete".