Envoy is using all the memory and the pods are getting evicted. Is there a way to set limit to how much memory envoy proxy can use in the envoy configuration file?
You can probably do that by configuring the overload-manager in the bootstrap configuration for Envoy. Here's a documentation link for more details. It is done simply by adding overload-manager section as follows:
overload_manager:
refresh_interval: 0.25s
resource_monitors:
- name: "envoy.resource_monitors.fixed_heap"
typed_config:
"#type": type.googleapis.com/envoy.extensions.resource_monitors.fixed_heap.v3.FixedHeapConfig
# TODO: Tune for your system.
max_heap_size_bytes: 2147483648 # 2 GiB <==== fix this!
actions:
- name: "envoy.overload_actions.shrink_heap"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.95
- name: "envoy.overload_actions.stop_accepting_requests"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.98
Related
Having setup Kibana and a fleet server, I now have attempted to add APM.
When going through the general setup - I forever get an error no matter what is done:
failed to listen:listen tcp *.*.*.*:8200: bind: can't assign requested address
This is when following the steps for setup of APM having created the fleet server.
This is all being launched in Kubernetes and the documentation has been gone through several times to no avail.
We did discover that we can hit the
/intake/v2/events
etc endpoints when shelled into the container but 404 for everything else. Its close but no cigar so far following the instructions.
As it turned out, the general walk through is soon to be depreciated in its current form as is.
And setup is far far simpler in a helm file where its actually possible to configure kibana with package ref for your named apm service.
xpack.fleet.packages:
- name: system
version: latest
- name: elastic_agent
version: latest
- name: fleet_server
version: latest
- name: apm
version: latest
xpack.fleet.agentPolicies:
- name: Fleet Server on ECK policy
id: eck-fleet-server
is_default_fleet_server: true
namespace: default
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
package_policies:
- name: fleet_server-1
id: fleet_server-1
package:
name: fleet_server
- name: Elastic Agent on ECK policy
id: eck-agent
namespace: default
monitoring_enabled:
- logs
- metrics
unenroll_timeout: 900
is_default: true
package_policies:
- name: system-1
id: system-1
package:
name: system
- package:
name: apm
name: apm-1
inputs:
- type: apm
enabled: true
vars:
- name: host
value: 0.0.0.0:8200
Making sure these are set in the kibana helm file will allow any spun up fleet server to automatically register as having APM.
The missing key in seemingly all the documentation is the need of a APM service.
The simplest example of which is here:
Example yaml scripts
I am trying to create an alarm for a sagemaker endpoint using cloudformation. My endpoint has two variants. My cloud formation file looks similar to below:
MySagemakerAlarmCPUUtilization:
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmName: MySagemakerAlarmCPUUtilization
AlarmDescription: Monitor the CPU levels of the endpoint
MetricName: CPUUtilization
ComparisonOperator: GreaterThanThreshold
Dimension:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
Namespace: AWS/SageMaker/Endpoints
EvaluationPeriods: 1
Period: 600
Statistic: Average
Threshold: 50
I am having an issue though with the dimension part. I get an invalid property error here. Does anyone know the correct syntax to look at a particular variant of an endpoint in cloud formation?
Realised I just had a typo in this. It should read Dimensions. So:
Dimensions:
- Name: EndpointName
Value: my-endpoint
- Name: VariantName
Value: variant1
But the code is right if anyone else wanted to use it
I'm currently getting a 413 Request Entity Too Large when posting something routing through a Spring Cloud Gateway. It works when the request body isn't larger than around 3MB.
Here is my application.yml (Scrubbed)
spring:
profiles:
active: prod
main:
allow-bean-definition-overriding: true
application:
name: my-awesome-gateway
cloud:
gateway:
default-filters:
- DedupeResponseHeader=Access-Control-Allow-Origin Access-Control-Allow-Credentials, RETAIN_UNIQUE
routes:
- id: my-service
uri: https://myservicesdomainname
predicates:
- Path=/service/**
filters:
- StripPrefix=1
- UserInfoFilter
- name: Hystrix
args:
name: fallbackCommand
fallbackUri: forward:/fallback/first
- name: RequestSize
args:
maxSize: 500000000 #***** Here is my attempt to increase the size
httpclient:
connect-timeout: 10000
response-timeout: 20000
This is the link I got RequestSize/args/maxSize from
https://cloud.spring.io/spring-cloud-static/spring-cloud-gateway/2.1.0.RELEASE/multi/multi__gatewayfilter_factories.html#_requestsize_gatewayfilter_factory
Edit:
The issue was with a Kubernetes Ingress Controller. I fixed the issue there and it's now working
it only compares Content-Length request header with specified limit, and rejects right away, i.e. it's not counting uploaded bytes
I am trying to setup a cloudwatch alert that if more than lets say 5000 http requests are sent to an AWS ES cluster using CloudFormation, I see there is the ElasticsearchRequests metric i can use and this is what i have so far:
ClusterElasticsearchRequestsTooHighAlarm:
Condition: HasAlertTopic
Type: 'AWS::CloudWatch::Alarm'
Properties:
AlarmActions:
- {'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'}
AlarmDescription: 'ElasticsearchRequests are too high.'
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: ClientId
Value: !Ref 'AWS::AccountId'
- Name: DomainName
Value: !Ref ElasticsearchDomain
EvaluationPeriods: 1
MetricName: 'ElasticsearchRequests'
Namespace: 'AWS/ES'
OKActions:
- {'Fn::ImportValue': !Sub '${ParentAlertStack}-TopicARN'}
Period: 60
Statistic: Maximum
Threshold: 5000
Does this look correct?
Should I use SampleCount instead of Maximum for the Statistic?
Any advice is much appreciated
According to the AWS Doc about monitoring ELasticSearch/OpenSearch clusters, the relevant statistic for the metric ElasticsearchRequests is Sum.
Here is what the docs say:
OpenSearchRequests
The number of requests made to the Elasticsearch/OpenSearch cluster.
Relevant statistics: Sum
I am trying to create a yaml file to deploy gke cluster in a custom network I created. I get an error
JSON payload received. Unknown name \"network\": Cannot find field."
I have tried a few names for the resources but I am still seeing the same issue
resources:
- name: myclus
type: container.v1.cluster
properties:
network: projects/project-251012/global/networks/dev-cloud
zone: "us-east4-a"
cluster:
initialClusterVersion: "1.12.9-gke.13"
currentMasterVersion: "1.12.9-gke.13"
## Initial NodePool config.
nodePools:
- name: "myclus-pool1"
initialNodeCount: 3
version: "1.12.9-gke.13"
config:
machineType: "n1-standard-1"
oauthScopes:
- https://www.googleapis.com/auth/logging.write
- https://www.googleapis.com/auth/monitoring
- https://www.googleapis.com/auth/ndev.clouddns.readwrite
preemptible: true
## Duplicates node pool config from v1.cluster section, to get it explicitly managed.
- name: myclus-pool1
type: container.v1.nodePool
properties:
zone: us-east4-a
clusterId: $(ref.myclus.name)
nodePool:
name: "myclus-pool1"
I expect it to place the cluster nodes in this network.
The network field needs to be part of the cluster spec. The top-level of properties should just be zone and cluster, network should be on the same indentation as initialClusterVersion. See more on the container.v1.cluster API reference page
Your manifest should look more like:
EDIT: there is some confusion in the API reference docs concerning deprecated fields. I offered a YAML that applies to the new API, not the one you are using. I've update with the correct syntax for the basic v1 API and further down I've added the newer API (which currently relies on gcp-types to deploy.
resources:
- name: myclus
type: container.v1.cluster
properties:
projectId: [project]
zone: us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: [network_name]
subnetwork: [subnet] ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-1
type: container.v1.nodePool
properties:
projectId: [project]
zone: us-central1-f
clusterId: $(ref.myclus.name)
nodePool:
name: my-clus-pool2
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
The newer API (which provides more functionality and allows you to use more features including the v1beta1 API and beta features) would look something like this:
resources:
- name: myclus
type: gcp-types/container-v1:projects.locations.clusters
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f
cluster:
name: my-clus
zone: us-central1-f
network: shared-vpc
subnetwork: local-only ### leave this field blank if using the default network
initialClusterVersion: "1.13"
nodePools:
- name: my-clus-pool1
initialNodeCount: 0
config:
imageType: cos
- name: my-pool-2
type: gcp-types/container-v1:projects.locations.clusters.nodePools
properties:
parent: projects/shared-vpc-231717/locations/us-central1-f/clusters/$(ref.myclus.name)
nodePool:
name: my-clus-separate-pool
initialNodeCount: 0
version: "1.13"
config:
imageType: ubuntu
Another note, you may want to modify your scopes, the current scopes will not allow you to pull images from gcr.io, some system pods may not spin up properly and if you are using Google's repository, you will be unable to pull those images.
Finally, you don't want to repeat the node pool resource in both the cluster spec and separately. Instead, create the cluster with a basic (default) node pool, for all additional node pools, create them as separate resources to manage them without going through the cluster. There are very few updates you can perform on a node pool, asside from resizing