Deployment gaps at fast pace growing application [closed] - kubernetes

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 months ago.
Improve this question
Some context:
I have little experience with ci/CD and manage a fast paced growing application since it saw the light of the day for the first time. It is composed by several microservices at different environments. Devs are constantly pushing new code to DEV , but they frequently forget about sending new values from their local .env ove to the openshift cloud, regardless of this being a brand new environment or existing ones.
The outcome? Services that fail because they lack to have their secrets updated.
I understand the underlying issue is lack of communication between both us DevOps staff and devs themselves. But I've been trying to figure out some sort of process that would make sure we are not missing anything. Maybe something like a "before takeoff checklist" (yes, like the ones pilots do in a real flight preparation): if the chack fails then the aircraft is not ready to takeoff.
So the question is for everyone out there that practices DevOps. How do you guys deal with this?
Does anyone automates this within Openshift/kubernetes, for example? From your perspective and experience, would you suggest any tools for that, or simply enforce communication?

Guess no checklist or communication would work for team that ...frequently forget about sending new values from their local .env ove..., which you must have already done.
Your step in the pipeline should check for service availability before proceeding to next step, eg. does the service has endpoint registered within acceptable time, no endpoint means the backing pod(s) did not enter readiness state as expected. In such case, rollback and send notification to the team responsible for the service/application and exit cleanly.
There's no fix formula for CI/CD, especially human error. Check & balance at every step is the least you can do to trigger early warning and avoid a disastrous deployment.

Related

How to configure a resources in a pool to handle several agents [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to simulate a call center with chatting and in this scenario, a customer service representative can serve multiple customer chats at the same time, depending on their capabilities
I started nby creating an Employee agent and build on this, but I could not simulate a scenario in which one “Employee agent” can serve several client “chat” agents at the same time based on their total capacity, as in a real chat call center ...
Please advise how I can configure the logic so that several agents can capture / delay one resource. Or create a block in which the employee agent will bypass each chat and check if he can release it.
Thanks in advance
This is a more advanced question and not that easy to answer in detail without building a lot of logic and functionality.
Overall I can suggest the following design, but depending on your level of expertise in AnyLogic (And Java) this might not be the best design and I am curious to see if anyone will venture any other options. But as for a moderate user (and use case), this design will be sufficient
Since there is no way to do what you asked with a standard resource pool I would suggest to setup a resource pool inside a new agent type and then either as a population or graphically (as per my design) you can send chats to these agents. Since each agent has a resource pools inside of them you can define the number of chats an agent can handle in the parameters of the agent which defines the resources in the resource pool
You can then have a function that takes a chat from the queue and gives it to the first available agent that has capacity.
And you call this function whenever something arrives in the queue as well as when something leaves a chat agent and also when a agent gets a new chat as multiple chats might arrive at the same time and we only send the first one.

What causes cold start in serverless [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
I have read enough papers on serverless cold start, but have not found a clear explanation on what causes cold start. Could you try to explain it from both commercial and open-source platform's points of view?
commercial platform such as AWS Lambda or Azure Funtion. I know they are more like a black-box to us
There are open-source platforms such as OpenFaaS, Knative, or OpenWhisk. Do those platforms also have a cold start issue?
My initial understanding about cold start latency is time spent on spinning up a container. After the container being up, it can be reused if not being killed yet, so there is a warm start. Is this understanding really true? I have tried to run a container locally from the image, no matter how large the image is, the latency is near to none.
Is the image download time also part of cold start? But no matter how many cold starts happened in one node, only one image download is needed, so this seems to make no sense.
Maybe a different question, I also wonder what happened when we instantiate a container from the image? Are the executable and its dependent libraries (e.g., Python library) copied from disk into memory during this stage? What if there are multiple containers based on the same image? I guess there should be multiple copies from disk to memory because each container is an independent process.
There's a lot of levels of "cold start" that all add latency. The hottest of the hot paths is the container is still running and additional requests can be routed to it. The coldest is a brand new node so it has to pull the image, start the container, register with SD, wait for the serverless plane's routing stuffs to update, probably some more steps if you dig deep enough. Some of those can happen in parallel but most can't. If the pod has been shut down because it wasn't being used, and the next run schedules on the same machine then yes kubelet usually skips pulling image (unless imagePullPolicy Always is forced somewhere) so you get a bit of a faster launch. K8s' scheduler doesn't generally optimize for that though.

Is there a way that an application or a system to update without shutting down? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I work in a hospital where the system shuts down when updating. making all orders hanging with no approvals or modifications. considering it's a hospital, this is a huge problem. so, my question is how can we update the system without it shutting down. I'm most interested in rolling updates where there's no down time.
This is a very broad question, but generally, yes, it is perfectly possible to update a system without shutting down the system.
The simplest possible solution is to have a duplicate system. Let's say you are currently working with System A. When you want to do an update, you update System B. The update can take as long as it needs, since you are not using System B. There will be no impact at all.
Once the update is finished, you can test the hell out of System B to make sure the update didn't break anything. Again, this has no impact on working with the system. Only after you are satisfied that the update didn't break anything, do you switch over to using System B.
This switchover is near instantaneous.
If you discover later that there are problems with the update, you can still switch back to System A which is still running the old version.
For the next update, you again update the system which is currently not in use (in this case System A) and follow all the same steps.
You can do the same if you have a backup system. Update the backup system, then fail over, then update the main system. Just be aware of the fact that while the update is happening, you do not have a backup system. So, if the main system crashes during the update process, you are in trouble. (Thankfully, this is not entirely as bad as it sounds, because it least you will already have a qualified service engineer on the system anyway who can immediately start working on either pushing the update forward to get the backup online or fix the problem with the main system.)
The same applies when you have a redundant system. You can temporarily disable redundancy, then update the disabled system, flip over, do it again. Of course, just like in the last option, you are operating without a safety net while the update process is ongoing.
If your system is a cluster system, it's even easier. If you have enough resources, you can take one machine out of the cluster, update it, then add it back into the cluster again, then do the next machine, and so on. (This is called a "rolling update", and is how companies like Netflix, Google, Amazon, Microsoft, Salesforce, etc. are able to never have any downtime.)
If you don't have enough resources, you can add a machine to the cluster just for the update, and then you are back to the situation that you do have enough resources.
Yes.
Every kind of component may be updated rebootlessly.
For windows you always can postpone reboots.

Maintaining a large website [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
When I update a website I just replace the content with a new file.
How do larger websites update their content? When thousands of visitors are viewing the site.
How do example, how do Facebook or Twitter do it? Thousands of developers working and millions of visitors on the website. Are they working on a duplicate of the website and then switching the DNS? Are they using Git?
Blue-Green is a widely used deployment strategy that avoids downtime.
First of, you need a router/load balancer that can forward requests to a Virtual IP to an actual machine. Where I work, we use F5.
You must also have two production environments, called "blue" and "green".
Only one of them is "live" at any time.
By this, I mean that your router must forward all incoming requests to either the "blue" environment, or the "green" environment.
Let's say "green" is live, and you need to release a new version of your app to production.
You deploy your new content/application to your "blue" environment (remember, no requests are being routed here, so the environment is "offline")
Then you test your "blue" environment and make sure everything's been deployed correctly before going live.
Then you change your router to forward all requests to your new and stable "blue" env.
If after going live you discover there's a bug, simply rollback by changing your router again to route all requests to your "green" environment, with the "old" application.
More about blue-green deployments here: BlueGreenDeployment
Another well known deployment strategy is the Canary Release, which enables new features for a small number of users, and once everything's been tested properly, it's enabled for all users.
They are working all with versioning systems like Git, SVN etc. So they can work in a team of different functions and push and review commits for being pulled on the live environment (pull requests). Also, the big sites have a really big testing infrastructure also.

When should I commit and push in Test Driven Development? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
My question regards TDD and when should I commit or push changes?
I'm wondering is it fine to push the code where some values returned by function are still faked or implementation is obvious to pass the test but interface exists.
In other words may I push the code before redactor? Or otherwise: may I push the code which does not change "interface" but does not do actually work yet?
I'm not telling about unit tests but more some integration/acceptance end-to-end tests where e.g. I'm getting some data from tool A, send it to tool B and check is database record was created. Implementing such tests is often time consuming and contains many asserts at the end but pushing early version of code allow another team member to work based on our part of work.
Thank you for answering this question in advance.
A development workflow is always some consensus between the developers that it involves, so there are no fixed rules in play here. You need to coordinate with the other developers to figure out what works best for you.
That said, my personal approach here is that you should never break the remote master. Instead, commit and push to a branch as soon as you have something that compiles (even if the tests fail), and then merge with master once your tests pass, i.e., only push working code to master, but have your non-working tests on a branch whenever you like.
If you have any kind of continuous integration system in place, this workflow ensures that you never end up breaking your CI build by pushing a bunch of failing tests to the branch that the CI system is going to pick up and test.