AWS ECS 503 Service Temporarily Unavailable error - amazon-ecs

I'm following a tutorial How to Deploy a Dockerised Application on AWS ECS With Terraform and running into a 503 error trying to hit my App.
The App runs fine in a local Container (http://localhost:3000/contacts), but is unreachable via ECS deployment. When I check the AWS Console, I see health checks are failing, so there's no successful deployment of the App.
I've read through / watched a number of tutorials, and they all have the same configuration as in the tutorial mentioned above. I'm thinking something must have changed on the AWS side, but I can't figure it out.
I've also read a number of 503-related posts here, and tried various things such as opening different ports, and setting SG ingress wide open, but to no avail.
If anyone is interested in troubleshooting, and has a chance, here's a link to my code: https://github.com/CumulusCycles/tf-cloud-demo
Thanks for any insights anyone may have on this!
Cheers,
Rob

Your target group is configured to forward traffic to port 80 on the container. Your container is listening on port 3000. You need to modify your target group to forward traffic to the port your container is actually listening on:
resource "aws_lb_target_group" "target_group" {
name = "target-group"
port = 3000
protocol = "HTTP"
target_type = "ip"
vpc_id = "${aws_default_vpc.default_vpc.id}" # Referencing the default VPC
}
Your load balancer port is the only port external users will see. Your load balancer is listening on port 80 so people can hit it over HTTP without specifying a port. When the load balancer receives traffic on that port it forwards it to the target group. The target group receives traffic and then forwards it to an instance in the target group, on the configured port.
It does seem a bit redundant, but you need to specify the port(s) that your container listens on in the ECS task definition, and then configure that same port again in both the target group configuration, and the ECS service's load balancer configuration. You may even need to configure it again in the target group's health check configuration if the default health checks don't work for your application.
Note: If you look at the comments on that blog post you linked, you'll see several people saying the same thing about the target group port mapping.

Related

Is it possible for me to get a log which shows the source IP of requests hitting a NodePort in my Kubernetes cluster?

I have a container with an exposed port in a pod. When I check the log in the containerized app, the source of the requests is always 192.168.189.0 which is a cluster IP. I need to be able to see the original source IP of the request. Is there any way to do this?
I tried modifying the service (externalTrafficPolicy: Local) instead of Cluster but it still doesn't work. Please help.
When you are working on an application or service that needs to know the source IP address you need to know the topology of the network you are using. This means that you need to know how the different layers of loadbalancers or proxies works to deliver the traffic to your service.
Depending on what cloud provider you are using or the loadbalancer you have in front of your application the source IP address should be on a header of the request. The header you have to look for is X-Fordwared-for, more info here, depending on the proxy or loadbalancer you are using sometimes you need to activate this header to receive the correct IP address.

Change Kubernetes Instance Template to open HTTPS port

I was using NodePort to host a webapp on Google Container Engine (GKE). It allows you to directly point your domains to the node IP address, instead of an expensive Google load balancer. Unfortunately, instances are created with HTTP ports blocked by default, and an update locked down manually changing the nodes, as they are now created using and Instance Group/and an Immutable Instance Template.
I need to open port 443 on my nodes, how do I do that with Kubernetes or GCE? Preferably in an update resistant way.
Related github question: https://github.com/nginxinc/kubernetes-ingress/issues/502
Using port 443 on your Kubernetes nodes is not a standard practice. If you look at the docs you and see the kubelet option --service-node-port-range which defaults to 30000-32767. You could change it to 443-32767 or something. Note that every port under 1024 is restricted to root.
In summary, it's not a good idea/practice to run your Kubernetes services on port 443. A more typical scenario would be an external nginx/haproxy proxy that sends traffic to the NodePorts of your service. The other option you mentioned is using a cloud load balancer but you'd like to avoid that due to costs.
Update: A deamonset with a nodeport can handle the port opening for you. nginx/k8s-ingress has a nodeport on 443 which gets exposed by a custom firewall rule. the GCE UI will not show「Allow HTTPS traffic」as checked, because its not using the default rule.
You can do everything you do on the GUI Google Cloud Console using the Cloud SDK, most easily through the Google Cloud Shell. Here is the command for adding a network tag to a running instance. This works, even though the GUI disabled the ability to do so
gcloud compute instances add-tags gke-clusty-pool-0-7696af58-52nf --zone=us-central1-b --tags https-server,http-server
This also works on the beta, meaning it should continue to work for a bit.
See https://cloud.google.com/sdk/docs/scripting-gcloud for examples on how to automate this. Perhaps consider running on a webhook when downtime is detected. Obviously none of this is ideal.
Alternatively, you can change the templates themselves. With this method you can also add a startup to new nodes, which allows you do do things like fire a webhook with the new IP Address for a round robin low downtime dynamic dns.
Source (he had the opposite problem, his problem is our solution): https://stackoverflow.com/a/51866195/370238
If I understand correctly, if nodes can be destroyed and recreated themselves , how are you going to rest assured that certain service behind port reliably available on production w/o any sort of load balancer which takes care of route orchestration diverting port traffic to new node(s)

Change path while forwarding with AWS Elastic Load Balancer

I have a number of containers running in Amazon ECS (in a private subnet) and each serving a different app on port 8080.
I have a public-facing ELB (attached to apps.example.com) forwarding traffic based on the requested path. To illustate, apps.example.com/app1 is forwarded to the target group for the app1 service on port 8080.
The problem I have is that the apps running in the containers are not expecting a path.
Right now, it seems like apps.example.com/app1 is forwarded to private_app1_container:8080/app1 but I need it to be forwarded to private_app1_container:8080.
Is there a way to achieve that?
I am creating the forwarding rules via the aws web interface and while I can forward to a specific target group, I do not see a way to specify the forwarding path. I have thought of redirecting instead of forwarding but my containers are in a private subnet and I would like them to stay isolated.

Restrict aws security groups on kubernetes cluster

I created my kubernetes cluster with specified security group for each ec2 server type, for example for backend server I have backend-sg associated with and a node-sg which is created with the cluster.
Now I try to restrict access to my backend ec2 and open only port 8090 as an inbound and port 8080 as an outbound to a specific security group (lets call it frontend-sg).
I was manage to do so but when changing the inbound port to 8081 in order to check that those restrictions actually worked I was still able to acess port 8080 from the frontend-sg ec2.
I think I am missing something...
Any help would be appreciated
Any help would be appriciated
I will try to illustrate situation in this answer to make it more clear. If I'm understanding your case correctly, this is what you have so far:
Now if you try ports from Frontend EC2 instance to Backend EC2 instance, and they are in same security group (node-sg) you will have traffic there. If you want to check group isolation then you should have one instance outside of node-sg and only in frontend-sg targetting any instance in backend-sg (supposing that both node-sg and backend-sg are not permitting said ports for inbound traffic)...
Finally, a small note... Kubernetes is by default closing all traffic (and you need ingress, loadbalancer, upstream proxy, nodePort or some other means to actually expose your front-facing services) so traditional fine graining of backend/frontend instances and security groups is not that "clearcut" when using k8s, especially since you don't really want to schedule manually (or by labels for that matter) which instances pods will actually run (but instead leave that to k8s scheduler for better unitilization of resources).

Connect to On Premises Service Fabric Cluster

I've followed the steps from Microsoft to create a Multi-Node On-Premises Service Fabric cluster. I've deployed a stateless app to the cluster and it seems to be working fine. When I have been connecting to the cluster I have used the IP Address of one of the nodes. Doing that, I can connect via Powershell using Connect-ServiceFabricCluster nodename:19000 and I can connect to the Service Fabric Explorer website (http://nodename:19080/explorer/index.html).
The examples online suggest that if I hosted in Azure I can connect to http://mycluster.eastus.cloudapp.azure.com:19000 and it resolves, however I can't work out what the equivalent is on my local. I tried connecting to my sample cluster: Connect-ServiceFabricCluster sampleCluster.domain.local:19000 but that returns:
WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service...
WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM...
False
WARNING: No such host is known
Connect-ServiceFabricCluster : No cluster endpoint is reachable, please check if there is connectivity/firewall/DNS issue.
Am I missing something in my setup? Should there be a central DNS entry somewhere that allows me to connect to the cluster? Or am I trying to do something that isn't supported On-Premises?
Yup, you're missing a load balancer.
This is the best resource I could find to help, I'll paste relevant contents in the event of it becoming unavailable.
Reverse Proxy — When you provision a Service Fabric cluster, you have an option of installing Reverse Proxy on each of the nodes on the cluster. It performs the service resolution on the client’s behalf and forwards the request to the correct node which contains the application. In majority of the cases, services running on the Service Fabric run only on the subset of the nodes. Since the load balancer will not know which nodes contain the requested service, the client libraries will have to wrap the requests in a retry-loop to resolve service endpoints. Using Reverse Proxy will address the issue since it runs on each node and will know exactly on what nodes is the service running on. Clients outside the cluster can reach the services running inside the cluster via Reverse Proxy without any additional configuration.
Source: Azure Service Fabric is amazing
I have an Azure Service Fabric resource running, but the same rules apply. As the article states, you'll need a reverse proxy/load balancer to resolve not only what nodes are running the API, but also to balance the load between the nodes running that API. So, health probes are necessary too so that the load balancer knows which nodes are viable options for sending traffic to.
As an example, Azure creates 2 rules off the bat:
1. LBHttpRule on TCP/19080 with a TCP probe on port 19080 every 5 seconds with a 2 count error threshold.
2. LBRule on TCP/19000 with a TCP probe on port 19000 every 5 seconds with a 2 count error threshold.
What you need to add to make this forward-facing is a rule where you forward port 80 to your service http port. Then the health probe can be an http probe that hits a path to test a 200 return.
Once you get into the cluster, you can resolve the services normally and SF will take care of availability.
In Azure-land, this is abstracted again to using something like API Management to further reverse proxy it to SSL. What a mess but it works.
Once your load balancer is set up, you'll have a single IP to hit for management, publishing, and regular traffic.