I see there are 2 separate metrics ApproximateNumberOfMessagesVisible and ApproximateNumberOfMessagesNotVisible.
Using number of messages visible causes processing pods to get triggered for termination immediately after they pick up the message from queue, as they're no longer visible. If I use number of messages not visible, it will not scale up.
I'm trying to scale a kubernetes service using horizontal pod autoscaler and external metric from SQS. Here is template external metric:
apiVersion: metrics.aws/v1alpha1
kind: ExternalMetric
metadata:
name: metric-name
spec:
name: metric-name
queries:
- id: metric_name
metricStat:
metric:
namespace: "AWS/SQS"
metricName: "ApproximateNumberOfMessagesVisible"
dimensions:
- name: QueueName
value: "queue_name"
period: 60
stat: Average
unit: Count
returnData: true
Here is HPA template:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
name: hpa-name
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: deployment-name
minReplicas: 1
maxReplicas: 50
metrics:
- type: External
external:
metricName: metric-name
targetAverageValue: 1
The problem will be solved if I can define another custom metric that is a sum of these two metrics, how else can I solve this problem?
We used a lambda to fetch two metrics and publish a custom metric that is sum of messages in-flight and waiting, and trigger this lambda using cloudwatch events at whatever frequency you want, https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#rules:action=create
Here is lambda code for reference:
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch({region: ''}); // fill region here
const sqs = new AWS.SQS();
const SQS_URL = ''; // fill queue url here
async function getSqsMetric(queueUrl) {
var params = {
QueueUrl: queueUrl,
AttributeNames: ['All']
};
return new Promise((res, rej) => {
sqs.getQueueAttributes(params, function(err, data) {
if (err) rej(err);
else res(data);
});
})
}
function buildMetric(numMessages) {
return {
Namespace: 'yourcompany-custom-metrics',
MetricData: [{
MetricName: 'mymetric',
Dimensions: [{
Name: 'env',
Value: 'prod'
}],
Timestamp: new Date(),
Unit: 'Count',
Value: numMessages
}]
}
}
async function pushMetrics(metrics) {
await new Promise((res) => cloudwatch.putMetricData(metrics, (err, data) => {
if (err) {
console.log('err', err, err.stack); // an error occurred
res(err);
} else {
console.log('response', data); // successful response
res(data);
}
}));
}
exports.handler = async (event) => {
console.log('Started');
const sqsMetrics = await getSqsMetric(SQS_URL).catch(console.error);
var queueSize = null;
if (sqsMetrics) {
console.log('Got sqsMetrics', sqsMetrics);
if (sqsMetrics.Attributes) {
queueSize = parseInt(sqsMetrics.Attributes.ApproximateNumberOfMessages) + parseInt(sqsMetrics.Attributes.ApproximateNumberOfMessagesNotVisible);
console.log('Pushing', queueSize);
await pushMetrics(buildMetric(queueSize))
}
} else {
console.log('Failed fetching sqsMetrics');
}
const response = {
statusCode: 200,
body: JSON.stringify('Pushed ' + queueSize),
};
return response;
};
This seems to be a case of Thrashing - the number of replicas keeps fluctuating frequently due to the dynamic nature of the metrics evaluated.
IMHO, you've got a couple of options here.
You could look at adding a StabilizationWindow to your HPA and also probably limit the scale down rate. You'd have to try a few combination of metrics and see what works best for you as you'd best know the nature of metrics (ApproximateNumberOfMessagesVisible in this case) you see in your infrastructure.
Related
I created Application Gateway & then AKS cluster using C# (Pulumi Azure Native). Resources are created successfully (code is given below). While creating AKS, I enabled AGIC add-on with AddonProfiles = { //... }
AddonProfiles = {
"ingressApplicationGateway", new ManagedClusterAddonProfileArgs { //... }
}
I am trying to achieve Fanout Ingress with AGIC add-on (apps will run in AKS and will be served by Application Gateway). The problem is ingress is not working unless I manually set Contributor role to AGIC add-on managed identity at scope 'ApplicationGatewayResourceId'.
So far, I did the followings:
Not working: created an ingress (see Kubernetes manifest demo-api.yaml below). But ingress is not working
Working but requires manual intervention:
a managed identity "ingressapplicationgateway-xxx" is created automatically (in node resource group "MC_xxx") and used by AGIC add-on
I assigned Contributor role to ingressapplicationgateway-xxx managed identity at scope 'ApplicationGatewayResourceId' (using Azure portal)
created Kubernetes ingress again and this time it works
just to be confirmed, I removed Contributor role and re-created ingress -> ingress did not work
So, ingressapplicationgateway-xxx requires Contributor role which I can easily assign using portal/cli/PowerShell, but that's not what I want because I am implementing Infrastructure-as-Code using Pulumi C#
Bring your own identity is not applicable for AGIC add-on: I thought I will create Managed Identity, assign role and then use it for AGIC add-on, but according to Microsoft Documentation, "Bring your own identity" is only supported for API server and Kubelet identity (not possible for AGIC add-on)
I tried to get AGIC add-on Managed Identity ingressapplicationgateway-xxx (that belongs to node resource group MC_xxx) but failed to do so and asked question here (no answer yet).
Kubernetes manifest for ingress: demo-api.yaml
apiVersion: v1
kind: Service
metadata:
name: demo-api
namespace: default
spec:
type: ClusterIP
ports:
3. port: 80
selector:
app: demo-api
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-api
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: demo-api
template:
metadata:
labels:
app: demo-api
spec:
containers:
- name: demo-api
image: hovermind.azurecr.io/demoapi:latest
ports:
- containerPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: demo-ingress
namespace: default
annotations:
kubernetes.io/ingress.class: azure/application-gateway
appgw.ingress.kubernetes.io/backend-path-prefix: "/api/" # API has url prefix -> [Route("api")]
spec:
rules:
- host: agic-appgw-public-ip.japaneast.cloudapp.azure.com
http:
paths:
- path: /api*
pathType: Prefix
backend:
service:
name: demo-api
port:
number: 80
C# Code for Application Gateway
// ... ... ...
var hubAppGwSubnet = new Subnet($"{HubVirtualNetwork}.{ApplicationGatewaySubnet}", new AzureNative.Network.SubnetArgs {
// ... ... ...
}, new CustomResourceOptions { DependsOn = { hubVnet } });
//
// Create Public IP for App Gateway
//
var agicAppGwPublicIp = new PublicIPAddress(AgicApplicationGatewayPublicIp, new AzureNative.Network.PublicIPAddressArgs {
// ... ... ...
});
//
// Application Gateway configs from stack settings file
//
var agicAppGwArgs = config.RequireObject<JsonElement>(AgicApplicationGatewayArgs);
var agicAppGwName = agicAppGwArgs.GetName();
var agicAppGwSku = agicAppGwArgs.GetSku();
var agicAppGwMinCapacity = agicAppGwArgs.GetInt(MinCapacity);
var agicAppGwMaxCapacity = agicAppGwArgs.GetInt(MaxCapacity);
// Gateway IP Config (subnet to which Application Gateway would be deployed)
var appGwIpConfig = new ApplicationGatewayIPConfigurationArgs {
Name = AppGatewayIpConfigName,
Subnet = new SubResourceArgs {
Id = hubAppGwSubnet.Id,
}
};
//
// Create App Gateway
//
var agicAppGw = new ApplicationGateway(AgicApplicationGateway, new ApplicationGatewayArgs {
ApplicationGatewayName = agicAppGwName,
ResourceGroupName = mainResourceGroup.Name,
Sku = new ApplicationGatewaySkuArgs {
Name = agicAppGwSku,
Tier = agicAppGwSku
},
AutoscaleConfiguration = new ApplicationGatewayAutoscaleConfigurationArgs {
MinCapacity = agicAppGwMinCapacity,
MaxCapacity = agicAppGwMaxCapacity
},
GatewayIPConfigurations = { appGwIpConfig },
FrontendIPConfigurations = {
new ApplicationGatewayFrontendIPConfigurationArgs {
Name = AppGatewayFrontendIpConfigName,
PublicIPAddress = new SubResourceArgs {
Id = agicAppGwPublicIp.Id,
}
},
new ApplicationGatewayFrontendIPConfigurationArgs {
Name = $"{AppGatewayFrontendIpConfigName}_private",
PrivateIPAllocationMethod = IPAllocationMethod.Static,
PrivateIPAddress = "10.10.1.5", //hubApplicationGatewaySubnet.GetFirstUsableIp(),
Subnet = new SubResourceArgs {
Id = hubAppGwSubnet.Id
}
}
},
FrontendPorts = {
new ApplicationGatewayFrontendPortArgs {
Name = AppGatewayFrontendPort80Name,
Port = Port80
},
},
HttpListeners = {
new ApplicationGatewayHttpListenerArgs {
Name = AppGatewayHttpListenerName,
Protocol = Http,
FrontendIPConfiguration = new SubResourceArgs {
Id = $"/subscriptions/{subscriptionId}/resourceGroups/{mainResourceGroup.Name}/providers/Microsoft.Network/applicationGateways/{agicAppGwName}/frontendIPConfigurations/{AppGatewayFrontendIpConfigName}"
},
FrontendPort = new SubResourceArgs {
Id = $"/subscriptions/{subscriptionId}/resourceGroups/{mainResourceGroup.Name}/providers/Microsoft.Network/applicationGateways/{agicAppGwName}/frontendPorts/{AppGatewayFrontendPort80Name}",
},
}
},
BackendAddressPools = {
new ApplicationGatewayBackendAddressPoolArgs {
Name = AppGatewayBackendPoolName,
//BackendAddresses = { }
}
},
BackendHttpSettingsCollection = {
new ApplicationGatewayBackendHttpSettingsArgs {
Name = AppGatewayHttpSettingName,
Port = 80,
Protocol = Http,
RequestTimeout = 20,
CookieBasedAffinity = ApplicationGatewayCookieBasedAffinity.Enabled
}
},
RequestRoutingRules = {
new ApplicationGatewayRequestRoutingRuleArgs {
Name = AppGatewayRoutingName,
RuleType = ApplicationGatewayRequestRoutingRuleType.Basic,
Priority = 10,
HttpListener = new SubResourceArgs {
Id = $"/subscriptions/{subscriptionId}/resourceGroups/{mainResourceGroup.Name}/providers/Microsoft.Network/applicationGateways/{agicAppGwName}/httpListeners/{AppGatewayHttpListenerName}",
},
BackendAddressPool = new SubResourceArgs {
Id = $"/subscriptions/{subscriptionId}/resourceGroups/{mainResourceGroup.Name}/providers/Microsoft.Network/applicationGateways/{agicAppGwName}/backendAddressPools/{AppGatewayBackendPoolName}",
},
BackendHttpSettings = new SubResourceArgs {
Id = $"/subscriptions/{subscriptionId}/resourceGroups/{mainResourceGroup.Name}/providers/Microsoft.Network/applicationGateways/{agicAppGwName}/backendHttpSettingsCollection/{AppGatewayHttpSettingName}",
},
}
},
WebApplicationFirewallConfiguration = new ApplicationGatewayWebApplicationFirewallConfigurationArgs { },
}, new CustomResourceOptions { DependsOn = { hubAppGwSubnet } });
C# code for AKS Cluster
// ... ... ...
var systemNodePool = new ManagedClusterAgentPoolProfileArgs { // ... ... ...};
var userNodePool = new ManagedClusterAgentPoolProfileArgs { // ... ... ... };
var aadProfile = new ManagedClusterAADProfileArgs { // ... ... ... };
var networkProfile = new ContainerServiceNetworkProfileArgs { // ... ... ... };
//
// Create AKS Cluster
//
var aksAppClusterName = "xxx-aks-appcluster-dev-japaneast";
var aksAppClusterNodeRgName = $"{AksEntities.NodePoolResourceGroupPrefix}_{aksAppClusterName}";
var aksAppClusterDnsPrefix = "xxx-yyy";
var aksAppCluster = new ManagedCluster(AksAppplicationCluster, new ManagedClusterArgs {
ResourceName = aksAppClusterName,
ResourceGroupName = mainResourceGroup.Name,
AgentPoolProfiles = { systemNodePool, userNodePool },
DnsPrefix = aksAppClusterDnsPrefix,
EnableRBAC = true,
AadProfile = aadProfile,
NetworkProfile = networkProfile,
//ServicePrincipalProfile = spProfile,
NodeResourceGroup = aksAppClusterNodeRgName,
DisableLocalAccounts = true,
Identity = new ManagedClusterIdentityArgs { Type = AzureNative.ContainerService.ResourceIdentityType.SystemAssigned },
AddonProfiles = {
{
AksEntities.AddonProfileKeys.Agic, new ManagedClusterAddonProfileArgs {
Enabled = true,
Config = {
{
AksEntities.AddonProfileKeys.ApplicationGatewayId, agicAppGw.Id
},
}
}
},
},
}, new CustomResourceOptions { DependsOn = { spokeAksAppplicationSubnet, agicAppGw } });
I'm trying to restart my kubernetes deployment via the kubernetes api using the
#kubernetes/client-node Library. I'm not using deployment scale because i only need one deployment (db and service container) per app.
I also tried to restart a single container inside the deployment via exec (/sbin/reboot or kill), but it seems to not work with the nodejs library because it fails to upgrade to websocket connection, what is needed by the kubernetes exec endpoint as it seems. The other idea was to restart the whole deployment by setting the scale to 0 and then 1 again. But I dont get it working via the nodejs library. I tried to find an example for that, but was not successful.
The rolling restart is not working for me, becuase my application doesnt support multiple instances.
i tried it like this to scale
await k8sApi.patchNamespacedDeploymentScale(`mydeployment-name`, 'default', {
spec: { replicas: 0 },
});
await k8sApi.patchNamespacedDeploymentScale(`mydeployment-name`, 'default', {
spec: { replicas: 1 },
});
and to reboot the container i tried this
await coreV1Api.connectPostNamespacedPodExec(
podName,
'default',
'/sbin/reboot',
'web',
false,
false,
false,
false
);
Extra input:
When trying to use patchNamespacedDeployment i get the following error back by kubernetes api:
statusCode: 415,
statusMessage: 'Unsupported Media Type',
And response body:
V1Scale {
apiVersion: 'v1',
kind: 'Status',
metadata: V1ObjectMeta {
annotations: undefined,
clusterName: undefined,
creationTimestamp: undefined,
deletionGracePeriodSeconds: undefined,
deletionTimestamp: undefined,
finalizers: undefined,
generateName: undefined,
generation: undefined,
labels: undefined,
managedFields: undefined,
name: undefined,
namespace: undefined,
ownerReferences: undefined,
resourceVersion: undefined,
selfLink: undefined,
uid: undefined
},
spec: undefined,
status: V1ScaleStatus { replicas: undefined, selector: undefined }
when trying the exec approach i get the following response:
kind: 'Status',
apiVersion: 'v1',
metadata: {},
status: 'Failure',
message: 'Upgrade request required',
reason: 'BadRequest',
code: 400
i already looked the upgrade request error up, and it seems like the library isnt aware of this, because the library was generated from function footprints or something, so it is not aware of websockets.
Really seems like there is a bug in the node Kubernetes client library.
On PATCH requests it should set the content type to "application/json-patch+json" but instead it sends the content type as "application/json".
Thats why you get unsupported media type back by the api.
Furthermore you need to use the JSON Patch format for the body you send: http://jsonpatch.com
To manually set the content type you can pass custom headers to the function call.
This worked for me:
const patch = [
{
op: 'replace',
path: '/spec/replicas',
value: 0,
},
];
await k8sApi.patchNamespacedDeployment(
`mydeployment-name`,
'default',
patch,
undefined,
undefined,
undefined,
undefined,
{ headers: { 'content-type': 'application/json-patch+json' } }
);
After some google searching I found that this problem is already existing since 2018: https://github.com/kubernetes-client/javascript/issues/19
I created a CfnDomain in AWS CDK and I was trying to get the generated domain name to create an alarm.
const es = new elasticsearch.CfnDomain(this, id, esProps);
new cloudwatch.CfnAlarm(this, "test", {
...
dimensions: [
{
name: "DomainName",
value: es.domainName,
},
],
});
But it seems that the domainName attribute is actually the argument that I pass in (I passed none so it will be autogenerated), so it's actually undefined and can't be used.
Is there any way that I can specify it such that it will wait for the elasticsearch cluster to be created so that I can obtain the generated domain name, or is there any other way to created an alarm for the metrics of the cluster?
You use CfnDomain.ref as the domain value for your dimension. Sample alarm creation for red cluster status:
const domain: CfnDomain = ...;
const elasticDimension = {
"DomainName": domain.ref,
};
const metricRed = new Metric({
namespace: "AWS/ES",
metricName: "ClusterStatus.red",
statistic: "maximum",
period: Duration.minutes(1),
dimensions: elasticDimension
});
const redAlarm = metricRed.createAlarm(construct, "esRedAlarm", {
alarmName: "esRedAlarm",
evaluationPeriods: 1,
threshold: 1
});
I've declared a kubernetes deployment like:
const ledgerDeployment = new k8s.extensions.v1beta1.Deployment("ledger", {
spec: {
template: {
metadata: {
labels: {name: "ledger"},
name: "ledger",
// namespace: namespace,
},
spec: {
containers: [
...
],
volumes: [
{
emptyDir: {},
name: "gunicorn-socket-dir"
}
]
}
}
}
});
Later on in my index.ts I want to conditionally modify the volumes of the deployment. I think this is a quirk of pulumi I haven't wrapped my head around but here's my current attempt:
if(myCondition) {
ledgerDeployment.spec.template.spec.volumes.apply(volumes =>
volumes.push(
{
name: "certificates",
secret: {
items: [
{key: "tls.key", path: "proxykey"},
{key: "tls.crt", path: "proxycert"}],
secretName: "star.builds.qwil.co"
}
})
)
)
When I do this I get the following error: Property 'mode' is missing in type '{ key: string; path: string; }' but required in type 'KeyToPath'
I suspect I'm using apply incorrectly. When I try to directly modify ledgerDeployment.spec.template.spec.volumes.push() I get an error Property 'push' does not exist on type 'Output<Volume[]>'.
What is the pattern for modifying resources in Pulumi? How can I add a new volume to my deployment?
It's not possible to modify the resource inputs after you created the resource. Instead, you should place all the logic that defines the shape of inputs before you call the constructor.
In your example, this could be:
let volumes = [
{
emptyDir: {},
name: "gunicorn-socket-dir"
}
]
if (myCondition) {
volumes.push({...});
}
const ledgerDeployment = new k8s.extensions.v1beta1.Deployment("ledger", {
// <-- use `volumes` here
});
const k8s = require('kubernetes-client');
const endpoint = 'https://' + IP;
const ext = new k8s.Extensions({
url: endpoint,
version: 'v1beta1',
insecureSkipTlsVerify: true,
namespace,
auth: {
bearer: token,
},
});
const body = {
spec: {
template: {
spec: {
metadata: [{
name,
image,
}]
}
}
}
};
ext.namespaces.deployments(name).put({body}, (err, response => { console.log(response); })
The above functions seem to authenticate with GET and PUSH, however I get the following error message when using POST.
the server does not allow this method on the requested resource
Blockquote
I think the problem might be, that due to changes of Kubernetes 1.6 to RCAB your pod has not the right privileges to schedule pods, get logs, ... through the API server.
Make sure you are using the admin.conf kubeconfig.
But be aware that giving the node cluster admin permissions sets anyone who can access the node to cluster admin ;)