My marathon app definition has a uri array. One of the resource in them will be periodically updated (read replaced). I need to force marathon to fetch URI and replace the resource in sandbox.
From Marathon Resources Basic, I understand that marathon uses Mesos fetcher to do the job when app is restarted. However I have read the docs and found no way of doing the same withhout restarting the app.
One way I can think is to replace the resource in the sandbox without relying on Marathon.
There is no way that you can tell marathon to re-fetch your uri resources periodically without app restart.
You can run a daemon in your app to fetch the uri resource and extract if needed to mesos sandbox path.
Related
During a pipeline run, under deployment job, providing a deployment environment eliminates the need of providing service connection manually. I'd guess, it's either creating a new SC at this time or it would have created SC at the time of environment creation and using the same.
Either ways, is there a way to find out which Service connection is being used from the logs of pipeline run or from anywhere else?
In our environment, I see a lot of service connection for one environment and a cleanup is necessary to get things in place.
I tried giving SC manually along with environment and it works as expected. So, going forward, I can use this method. But for cleanup, I'd still like to know which one gets used when not specified! (none of the auto-created SCs show any execution history, but I know the deployment has happened multiple times)
As a Kubernetes resource in an environment is referencing Kubernetes service connection, you can use this API to list the serviceEndpointId of a Kubernetes resource, which is also the resourceId of the referenced service connection.
GET https://dev.azure.com/{organization}/{project}/_apis/distributedtask/environments/{environmentId}/providers/kubernetes/{resourceId}?api-version=7.0
Applied with the value of the serviceEndpointId from the response of the above API, we can proceed to use this API to get the referenced service connection details.
GET https://dev.azure.com/{organization}/{project}/_apis/serviceendpoint/endpoints/{endpointId}?api-version=7.0
We are developing a microservice based system that is orchestrated using Kubernetes. Part of our use case is supplying our clients an On-Premise installation where they receive an Image (VMDK / QCOW2) with all the system deployed.
One of our main challenges is handling the update process of such system, currently the plan is to have an API endpoint that will receive an encrypted and signed package that will contain all the images and a certain update shell script. The API endpoint will start an asynchronous process that will extract the images and execute the shell script that eventually should call the Kubernetes to update all the images with the new code.
The question is where this API endpoint should be defined?
Be in a special "Maintenance" service that will be outside of the Kubernetes and control it, this service will be updated last in case it's code should be also updated.
Be part of one of the microservices containers that run inside Kubernetes - but then this image can be part of the updated images so any API that should return the update status can be un-available
What is the common way to export an interface to System Update or System Deployment wizard processes?
Thanks!
The Cloud Composer documentation explicitly states that:
Due to an issue with the Kubernetes Python client library, your Kubernetes pods should be designed to take no more than an hour to run.
However, it doesn't provide any more context than that, and I can't find a definitively relevant issue on the Kubernetes Python client project.
To test it, I ran a pod for two hours and saw no problems. What issue creates this restriction, and how does it manifest?
I'm not deeply familiar with either the Cloud Composer or Kubernetes Python client library ecosystems, but sorting the GitHub issue tracker by most comments shows this open item near the top of the list: https://github.com/kubernetes-client/python/issues/492
It sounds like there is a token expiration issue:
#yliaog this is an issue for us, as we are running kubernetes pods as
batch processes and tracking the state of the pods with a static
client. Once the client object is initialized, it does no refresh, and
therefore any job that takes longer than 60 minutes will fail. Looking
through python-base, it seems like we could make a wrapper class that
generates a new client (or refreshes the config) every n minutes, or
checks status prior to every call (as #mvle suggested). The best fix
would be in swagger-codegen, but a temporary solution would probably
be very useful for a lot of people.
- #flylo, https://github.com/kubernetes-client/python/issues/492#issuecomment-376581140
https://issues.apache.org/jira/browse/AIRFLOW-3253 is the reason (and hopefully, my fix will be merged soon). As the others suggested, this affects anyone using the Kubernetes Python client with GCP auth. If you are authenticating with a Kubernetes service account, you should see no problem.
If you are authenticating via a GCP service account with gcloud (e.g. using the GKEPodOperator), you will generally see this problem with jobs that take longer than an hour because the auth token expires after an hour.
There are more insights here too.
Currently, long-running jobs on GKE always eventually fail with a 404 error (https://bitbucket.org/snakemake/snakemake/issues/932/long-running-jobs-on-kubernetes-fail). We believe that the problem is in the Kubernetes client, as we determined that although _refresh_gcp_token is being called when the token is expired, the next API call still fails with a 404 error.
You can see here that Snakemake uses the kubernetes python client.
I have created an app with spring batch(with partition) application taking example of this https://github.com/mminella/S3JDBC. My app is reading some files from object store and doing some processing and writing back to object the store. My app with local partition works fine in my machine.
I changed the maven, to run in cloud foundry , did change for deployer partition handler and step execution listener and deploying on pcf.
But while trying to push and run the app on pcf , I am getting an issue :
Failing URI /v2/info. I tried to log the error found that there is one call to my app e.g https://mypcf.com:443/v2/info and after that it gives the error. I cant provide full logs because of some restrictions. So I want to know :
To deploy a spring batch in pcf(is there any extra configuration
needed except the maven dependency and code changes for
deployerpartitionhandler and stepexecutionlistener and #cloudtask):
org.springframework.cloud spring-cloud-deployer-cloudfoundry
1.1.0.M1
Is it mandatory to have a separate data base service like my-sql for the partition job. Cant I use H2(the default one, if I
don't configure anything)?
Do I need to do any configuration in pcf to support running multiple partitions ?
As I am running remote partitioning , can I run that app on local STS or Intellij(not on PCF-DEV)so that it will run my app in
pcf(remote) and launch the workers.(Sorry for the stupid question ,
I am new to PCF).
Thanks for checking out my example. To answer your questions:
You should be able to use the latest deployer release (instead of that rather old version).
Yes. Partitioned steps need to all be able to share the same job repository data store so an in memory database like H2 will not work for that use case.
Besides defining your datasource, that's all that is required to live in PCF. That being said, there are other things that need to be configured, but you can use other mechanisms to do so (Spring Cloud Config Server, application.properties/yml, etc).
Yes, you should be able to run the master locally and have it deploy the workers onto PCF if you're using the CF deployer.
Please help me to know , Is there any option in the azure service fabric to delay deprovision ? I have a micro service application hosted in fabric which is distributed in different nodes at their instances . If i tried to disengage/deprovision the service from portal , Can the service fabric internally check whether any transaction is going any of the instances or not , If it is engaged , Will it wait for complete it ? Also want to know , If microsoft is not providing such a service , does we have any powershell command to check the instance status ?
Thanks
I assume that by "disengage/deprovision the service from portal" you are referring to deleting the service via the Service Fabric Explorer web app (perhaps via a link followed from the portal). Please correct me if this is wrong.
To answer your question directly, the framework will not wait for in-flight operations to complete during a service delete. Every replica for the service will lose its read and write permissions, causing all in-flight operations to fail. We do not offer a way to stall during this step in order to, for example, allow currently open transactions to be completed.
The reason we do not offer this semantic, is that service deletion is expected to be rare or permanent, and that delaying deletion for the final operation doesn't enable any additional scenarios. In either case, if a client is attempting operations on a service being deleted, either:
The last client operation may fail due to delete racing and revoking read/write permissions
Every subsequent client operation will fail due to the service no longer existing
or
The last client operation will succeed due to deletion being delayed
Every subsequent client operation will fail due to the service no longer existing
The expectation is that any client or dependent service should have already been updated or deleted prior to deleting the service they depend on, as you are making the permanent decision that this service should no longer exist.