ECS Service Discovery is updated too late after task is stopped - amazon-ecs

Stackoverflow Service Discovery
Translate: box in shopping feed
Stories ship method
Review + Maand
Hi,
I'm running 2 AWS ECS services (A and B) within the same cluster, using the Fargate launch type.
Service A should be able to connect to service B. This is possible using Service Discovery.
I created a service discovery backend.local with the TTL of 15 seconds. The tasks in service B are added to a target-group which has a de-registration of 30 seconds.
+--------------+ +-------------+ +--------------+
| Application +-----> ECS: A +--------> ECS: B |
| Load | +-------------+ +--------------+
| Balancer | | Task 1 | | Task 1 |
+--------------+ | Task 2 | | Task . |
+-------------+ | Task n |
+--------------+
This is working perfect, from service A, I can do requests to http://backend.local, which are routed to one of the tasks in service B.
However, after a rolling deploy of service B, the service discovery DNS records aren't updated in time. So nslookup backend.local also returns IP addresses of the old tasks which are not available anymore.
The lifecylcle of tasks during deployment is:
New task: Pending -> Activating -> Running
Old task: Running -> Deactivating --> Stopped
I would expect that new task are discoverable AFTER they are 'Running', and not discoverable anymore when the target-groups de-registration delay kicks in.
How can I make sure that the Service Discovery doesn't make old tasks discoverable?

Related

How to get all the pod that are killed by OOMKilled in AKS Log Analytics?

I have a kubernetes cluster in Azure (AKS) with the log analytics enabled. I can see that a lot of pods are being killed by OOMKilled message but I want to troubleshoot this with the log analytics from Azure. My question is how can I track or query, from the log analytics, all the pods that are killed by the OOMKilled reason?
Thanks!
The reason is somewhat hidden in the ContainerLastStatus field (JSON) of the KubePodInventory table. A query to get all pods killed with reason OOMKilled could be:
KubePodInventory
| where PodStatus != "running"
| extend ContainerLastStatusJSON = parse_json(ContainerLastStatus)
| extend FinishedAt = todatetime(ContainerLastStatusJSON.finishedAt)
| where ContainerLastStatusJSON.reason == "OOMKilled"
| distinct PodUid, ControllerName, ContainerLastStatus, FinishedAt
| order by FinishedAt asc

Require just One Approval for Devops Multistage Pipeline

I have a multi-stage YAML pipeline:
Build Solution
|
▼
Terraform DEV \
| |
▼ |
Deploy Function App | DEV Environment (No Approval Required)
| |
▼ |
Provision API Mgmt /
|
▼
Terraform TEST \
| |
▼ |
Deploy Function App | TEST Environment (Approval Required)
| |
▼ |
Provision API Mgmt /
I have two environments configured (DEV and TEST) with an Approval configured on the TEST environment and the Terraform TEST stage has a deployment job configured to use the TEST environment. This means that when the pipeline reaches the Terraform TEST stage an email is sent to the approvers for the TEST environment and it waits.
When that stage is then approved the build continues.
The Deploy Function App stage also has a deployment job targetting the environment for that part of the pipeline. My issue is that when it reaches the Deploy Function App for the TEST environment it again asks for approval to deploy to the TEST environment.
My question is: Is this fixed behaviour? i.e. whenever a deployment is made to an environment with an approval is a new approval required? Or is there a way to change it so a pipeline only needs one approval to deploy (as many times as required) to a specific environment?
This is by design. One such scenario for this if rolling back changes to a previous pipeline run it would be best practice to have an approval before redeploying code to the environment. As for the scenario where you have 3 stages and each one requires an approval this is by design:
A stage can consist of many jobs, and each job can consume several resources. Before the execution of a stage can begin, all checks on all the resources used in that stage must be satisfied. Azure Pipelines pauses the execution of a pipeline prior to each stage, and waits for all pending checks to be completed. Checks are re-evaluation based on the retry interval specified in each check. If all checks are not successful till the timeout specified, then that stage is not executed. If any of the checks terminally fails (for example, if you reject an approval on one of the resources), then that stage is not executed.
In your given scenario may I suggest the Terraform, function app and APIM deployments be part of the same stage? Each one of these jobs could also be templatized so can reuse them in your additional environments. This would eliminate the possibility a user approves these incorrectly (Unless you have dependsOn outlined) or the possibility that the Terraform Apply is the only one that is release.

How to get number of pods in AKS that were active in a given timeframe

So, I'm having an unexpectedly hard time figuring this out. I have a kubernetes cluster deployed in AKS. In Azure (or Kubernetes dashboard), How do I view how many active pods there were in a given time frame?
Updated 0106:
You can use the query below to count the number of active pods:
KubePodInventory
| where TimeGenerated > ago(2d)//set the time frame to 2 days
| where PodStatus == "Running"
| project PodStatus
| summarize count() by PodStatus
Here is the test result:
Original answer:
If you have configured monitoring, then you can use kusto query to fetch it.
Steps as below:
1.Go to azure portal -> your AKS.
2.In the left panel -> Monitoring -> click Logs.
3.In the table named KubePodInventory, there is a field PodStatus which you can use it as filter in your query. You can write your own kusto query and specify the Time range via portal(by clicking the Time range button) or in query(by using ago() function). You should also use the count() function to count the number.

AppFog Claims I have used all two service slots when I have none in use

AppFog claims that I am using both of my available services, when in reality I have deleted the services that were in use. (I had MySQL databases attached to a couple of apps I was using, but when I deleted the apps, I also deleted the services... they just never free'd up for some reason)
Anyone have any suggestions on how I might reclaim those lost services? It's kinda hard to have apps without services and it won't show me anything to unbind or delete in order to free up those slots.
-Thanks
C:\Sites>af info
AppFog Free Your Cloud Edition
For support visit http://support.appfog.com
Target: https://api.appfog.com (v0.999)
Client: v0.3.18.12
User: j****g#gmail.com
Usage: Memory (0B of 512.0M total)
Services (2 of 2 total)
Apps (0 of 2 total)
C:\Sites>af services
============== System Services ==============
+------------+---------+-------------------------------+
| Service | Version | Description |
+------------+---------+-------------------------------+
| mongodb | 1.8 | MongoDB NoSQL store |
| mongodb2 | 2.4.8 | MongoDB2 NoSQL store |
| mysql | 5.1 | MySQL database service |
| postgresql | 9.1 | PostgreSQL database service |
| rabbitmq | 2.4 | RabbitMQ message queue |
| redis | 2.2 | Redis key-value store service |
+------------+---------+-------------------------------+
=========== Provisioned Services ============
Probably easiest to email support#appfog.com and get them to look into it.

cloudfoundry multi node push

Can you create application instances on multiple nodes with a single push command. If so, what is the process of that? Would you create multiple DEA instances?
So for a configuration like this would a "vmc push appname --instances 4" create:
REST_ _ _ _ _ _ _ _ _
| | | |
DEA DEA DEA DEA
App1 App2 App3 App4
Or do you have to push instances manually to each DEA node?
If you run the command as you've listed it, yes, it will basically cause your application to be started "at once" on 4 DEAs. You don't need to run vmc push for each instance - in fact, you can't use vmc to individually address a DEA.
Once a DEA is running your app, you can have the app show e.g. the IP address and port the DEA is running on by interrogating the VCAP environment.