connection to external server (mongodb server) fails from fargate container deployed using cdk - amazon-ecs

I created a simple node.js/express app and created docker image and successfully pushed it to aws ecr.
Next, I created a cdk project to deploy this container to fargate with public application load balancer. ecs_patterns.ApplicationLoadBalancedFargateService
Although the deployment cmd (cdk deploy) was successful, cluster page in aws console shows "No tasks running" and Services tab within the cluster shows red bar with "0/1 Tasks running" and Tasks tab within cluster shows tasks are getting created and stopped (every 1 or 2 min, a task is created and eventually stopped and a new one is created and this keeps on going forever)
Going inside a stopped task and its Log tab shows
ERROR: Connecting to MongoDB failed. Please check if MongoDB server is running at the correct host/port.. This is the error message I have in my app when connection to mongodb fails when the server is initialized.
The DB credentials and connection url are valid (see below) and it runs in a separate EC2 instance with EIP and domain name. In fact, I can connect to the DB from my dev machine which is outside aws.
Also, just for trial, I created a stack manually through console by creating security groups (for load balancer and service), target group, application load balancer, listener (port 80 HTTP), cluster, task definition (with correct db credentials set in env var), service, etc., it's working without any issue.
All I want is to create similar stack using cdk (I don't want to manually create/maintain it)
Any clue on why connection to external server/db is failing from a fargate container would be very useful. I'm unable to compare the "cdk created cloudformation template" (that's not working) with the "manually created stack" (that's working) as there are too many items in the autogenerated template.
Here is the cdk code based on aws sample code:
const vpc = new ec2.Vpc(this, "MyVpc", { maxAzs: 2 });
const cluster = new ecs.Cluster(this, "MyCluster", { vpc });
const logDriver = ecs.LogDriver.awsLogs({ streamPrefix: "api-log" });
const ecrRepo = ecr.Repository.fromRepositoryName(this, "app-ecr", "abcdef");
new ecs_patterns.ApplicationLoadBalancedFargateService(
this, "FargateService", {
assignPublicIp: true,
cluster,
desiredCount: 1,
memoryLimitMiB: 1024,
cpu: 512,
taskImageOptions: {
containerName: "api-container",
image: ecs.ContainerImage.fromEcrRepository(ecrRepo),
enableLogging: true,
logDriver,
environment: { MONGO_DB_URL: process.env.DB_URL as string }
},
publicLoadBalancer: true,
loadBalancerName: "api-app-lb",
serviceName: "api-service"
}
);

It turned out to be a silly mistake! Instead of MONGO_DB_URL it should be DB_URL because that's what my node.js/express server in the container is using.

Related

Can't submit new job via gui on standalone kubernetes flink deployment (session mode)

After deploy a flink in standalone kubernetes mode (session cluster) i can't upload any new job using flink GUI. After click +Add New button and choosing jar file, the progress strap ends and nothing happens.
There is no information/error on Job Manager logs about this.
When I try to upload any file (eg. text file) I get an error, and there is an info at the log:
"Exception occured in REST handler: Only Jar files are allowed."
I've also try to upload fake jar (an empty file called .jar) and it works - I can upload this kind of file.
I have a brand new, clean Apache Flink cluster running on Kubernetes cluster.
I have used docker hub image and I've try two different versions:
*1.13.2-scala_2.12-java8, and
1.13-scala_2.11-java8*
But the result was the same on both versions.
My deployment are based on this howto:
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/resource-providers/standalone/kubernetes/
and I've used yaml files provided in Appendix #
Common cluster resource definitions # to this article:
flink-configuration-configmap.yaml
jobmanager-service.yaml
taskmanager-session-deployment.yaml
jobmanager-session-deployment-non-ha.yaml
I'e also used ingress controller to publish GUI running on 8081 on jobmanager.
I have tree pods (1 jobmanager, 2 task managers) and can't see any errors from flink logs.
Any suggestions what I'm missing, or when to find any errors ?
Problem solved. Problem was caused by nginx upload limit (default is 1024kb). Flink GUI are published outside Kubernetes using ingress controller and nginx.
When we try to upload job files bigger than 1MB (1024kb), nginx limit prevented to do it. Jobs with size below this limit (for example fake jar with 0 kb size) was uploaded successfully

ECS get image from QUAY.io and spin ec2Spot: Infinitely waiting for task to start - desiredCount = 1, pendingCount = 0

I've set up pipeline which talks to ECS and spins EC2Spot instance.
Getting stuck on following message
PRIMARY task ******:5 - runningCount = 0 , desiredCount = 1, pendingCount = 0
Which basically means that I'm waiting for task to start, but something is off in a set up and it never gets started. Any suggestions on where to look?
Note:
This is a testing app which spins up a browser so no ports required
No load balancer
Possibly quay.io integration miss, but cant figure out with no logs
CloudTrail log is empty with only success messaged upon taskDefinition create and update
Thanks
About 8 hours of hammering head of the wall and this issue was solved.
Long time ago, by this fella - https://stackoverflow.com/a/36533601/5332494
Steps that It took me to figure it out.
Look in the CloudTrail => Event history => Even name column(UpdateService) => click on View event => Find error message(was unable to place a task because no container instance met all of its requirements. Reason: No Container Instances were found in your cluster. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide) there which will take you to https://docs.aws.amazon.com/AmazonECS/latest/developerguide/service-event-messages.html#service-event-messages-1
Page in a link above specifies possible issues you are having if you got same message as I(see step 1). First option on that page:
No container instances were found in your cluster
took me to https://docs.aws.amazon.com/AmazonECS/latest/developerguide/launch_container_instance.html
That's where I've added docker instance to my ecs cluster and finally was able to add ec2 Spot instance through codefresh pipeline talking to ecs.
Notes:
Ecs had to talk to QUAY.io to pull docker image from their private registry. And all I had to do is create secret in AWS secret manager with default following format
{ "username": "your-Quay-Username",
"password": "your-Quay-password"
}
That's it :)

Running two containers on Fargate using CDK

I'd like to use Fargate to run two containers - one for the main project's backend, and another for the database (MongoDB). The basic example included in the GitHub repo shows how to run a single container on Fargate using CDK, however I still have 2 issues:
The example doesn't show how to run two containers.
I'd like to scale the database containers but have them share the data storage (so that the data gets stored in a central place and stays synchronized between the different containers).
I've figured out how to (sort of) fixed the first issue, similarly to how ecs.LoadBalancedFargateService is implemented, however the second issue still remains.
For reference, this is what I have so far in stack.ts (the rest is the basic boilerplate cdk init app --language typescript generates for you):
import cdk = require("#aws-cdk/cdk");
import ec2 = require("#aws-cdk/aws-ec2");
import ecs = require("#aws-cdk/aws-ecs");
import elbv2 = require("#aws-cdk/aws-elasticloadbalancingv2");
const {ApplicationProtocol} = elbv2;
export class AppStack extends cdk.Stack {
constructor(scope: cdk.App, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// Create VPC and Fargate Cluster
const vpc = new ec2.VpcNetwork(this, "FargateVPC", {
maxAZs: 2
});
const cluster = new ecs.Cluster(this, "Cluster", {vpc});
// Create task definition
const fargateTaskDefinition = new ecs.FargateTaskDefinition(this, "FargateTaskDef", {
memoryMiB: "512",
cpu: "256"
});
// Create container from local `Dockerfile`
const appContainer = fargateTaskDefinition.addContainer("Container", {
image: ecs.ContainerImage.fromAsset(this, "Image", {
directory: ".."
})
});
// Set port mapping
appContainer.addPortMappings({
containerPort: 5000
});
// Create container from DockerHub image
const mongoContainer = fargateTaskDefinition.addContainer("MongoContainer", {
image: ecs.ContainerImage.fromDockerHub("mongo")
});
// Set port mapping
mongoContainer.addPortMappings({
containerPort: 27017
});
// Create service
const service = new ecs.FargateService(this, "Service", {
cluster,
taskDefinition: fargateTaskDefinition,
desiredCount: 2
});
// Configure task auto-scaling
const scaling = service.autoScaleTaskCount({
maxCapacity: 5
});
scaling.scaleOnCpuUtilization("CpuScaling", {
targetUtilizationPercent: 70
});
// Create service with built-in load balancer
const loadBalancer = new elbv2.ApplicationLoadBalancer(this, "AppLB", {
vpc,
internetFacing: true
});
// Allow incoming connections
loadBalancer.connections.allowFromAnyIPv4(new ec2.TcpPort(5000), "Allow inbound HTTP");
// Create a listener and listen to incoming requests
const listener = loadBalancer.addListener("Listener", {
port: 5000,
protocol: ApplicationProtocol.Http
});
listener.addTargets("ServiceTarget", {
port: 5000,
protocol: ApplicationProtocol.Http,
targets: [service]
});
// Output the DNS where you can access your service
new cdk.Output(this, "LoadBalancerDNS", {
value: loadBalancer.dnsName
});
}
}
Thanks in advance.
Generally, running a database in a Fargate container is not recommended since there is not currently a good solution for persisting data. You could integrate a hook that copies data into something like S3 prior to a task stopping, but generally those kinds of solutions are very fragile and not recommended.
You may want to check out DocumentDB as an alternative to running your own MongoDB instances, though support for DocumentDB constructs in the CDK are not yet fully fleshed out.
Another alternative is to run regular ECS tasks and attach an EBS volume on your EC2 Instance. Then you can use docker volumes to mount the EBS volume to your container. With this approach, you'll need to tag the instance metadata and use an ECS placement constraint to ensure that your task gets placed on the instance that has the EBS volume attached.
If either of these approaches works for you, feel free to open a feature request on the CDK repository. Hope this helps!
Is AWS Fargate a hard requirement?
If not, you could opt for simple ECS + Ec2, it supports the use of persistent data volumes:
Fargate tasks only support nonpersistent storage volumes.
For EC2 tasks, use data volumes in the following common examples:
To provide persistent data volumes for use with a container
To define an empty, nonpersistent data volume and mount it on multiple containers
To share defined data volumes at different locations on different containers on the same container instance
To provide a data volume to your task that is managed by a third-party volume driver
I haven't tried it myself but it seems that CDK has stable support for ECS + Ec2.
PS the link to the basic example is broken, I tried to find the new location but in the new example repository without success.

mongodb mms monitoring agent does not find group members

I have installed the latest mongodb mms agent (6.5.0.456) on ubuntu 16.04 and initialised the replicaset. Hence I am running a single node replicaset with the monitoring agent enabled. The agent works fine, however it does not seem to actually find the replicaset member:
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:170] Received new configuration: Primary agent, Assigned 0 out of 0 plus 0 chunk monitor(s)
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Iterate:182] Nothing to do. Either the server detected the possibility of another monitoring agent running, or no Hosts are configured on the Group.
[2018/05/26 18:30:30.222] [agent.info] [components/agent.go:Run:199] Done. Sleeping for 55s...
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:746] Performing discovery with 0 hosts
[2018/05/26 18:30:30.222] [discovery.monitor.info] [components/discovery.go:discover:803] Received discovery responses from 0/0 requests after 891ns
I can see two processes for monitor agents:
/bin/sh -c /usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config >> /var/log/mongodb-mms/monitoring-agent.log 2>&1
/usr/bin/mongodb-mms-monitoring-agent -conf /etc/mongodb-mms/monitoring-agent.config
However if I terminate one, it also tears down the other, so I do not think that is the problem.
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
The rs.config() looks fine, with one replicaset member, which has a host field, which looks just fine. I can use that value to connect to the instance using the mongo command. no auth is configured.
EDIT
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset. This seems to be different to pre-cloud-manager days, where the agent was able to track the rs - if I remember correctly... Probably there still is a way to get this done easier, so I am leaving this question open for now...
So, question is what is the Group that the agent is referring to. Where is that configured? Or how do I find out which Group the agent refers to and how do I check if the group is configured correctly.
Configuration values for the Cloud Manager agent (such as mmsGroupId and mmsApiKey) are set in the config file, which is /etc/mongodb-mms/monitoring-agent.config by default. The agent needs this information in order to communicate with the Cloud Manager servers.
For more details, see Install or Update the Monitoring Agent and Monitoring Agent Configuration in the Cloud Manager documentation.
It kind of looks that the cloud manager now needs to be configured with the seed host. Then it starts to discover all the other nodes in the replicaset.
Unless a MongoDB process is already managed by Cloud Manager automation, I believe it has always been the case that you need to add an existing MongoDB process to monitoring to start the process of initial topology discovery. Once a deployment is monitored, any changes in deployment membership should automatically be discovered by the Cloud Manager agent.
Production employments should have authentication and access control enabled, so in addition to adding a seed hostname and port via the Cloud Manager UI you usually need to provide appropriate credentials.

Azure Service Fabric Cluster Update

I have a cluster in Azure and it failed to update automatically so I'm trying a manual update. I tried via the portal, it failed so I kicked off an update using PS, it failed also. The update starts then just sits at "UpdatingUserConfiguration" then after an hour or so fails with a time out. I have removed all application types and check my certs for "NETWORK SERVCIE". The cluster is 5 VM single node type, Windows.
Error
Set-AzureRmServiceFabricUpgradeType : Code: ClusterUpgradeFailed,
Message: Cluster upgrade failed. Reason Code: 'UpgradeDomainTimeout',
Upgrade Progress:
'{"upgradeDescription":{"targetCodeVersion":"6.0.219.9494","
targetConfigVersion":"1","upgradePolicyDescription":{"upgradeMode":"UnmonitoredAuto","forceRestart":false,"u
pgradeReplicaSetCheckTimeout":"37201.09:59:01","kind":"Rolling"}},"targetCodeVersion":"6.0.219.9494","target
ConfigVersion":"1","upgradeState":"RollingBackCompleted","upgradeDomains":[{"name":"1","state":"Completed"},
{"name":"2","state":"Completed"},{"name":"3","state":"Completed"},{"name":"4","state":"Completed"}],"rolling
UpgradeMode":"UnmonitoredAuto","upgradeDuration":"02:02:07","currentUpgradeDomainDuration":"00:00:00","unhea
lthyEvaluations":[],"currentUpgradeDomainProgress":{"upgradeDomainName":"","nodeProgressList":[]},"startTime
stampUtc":"2018-05-17T03:13:16.4152077Z","failureTimestampUtc":"2018-05-17T05:13:23.574452Z","failureReason"
:"UpgradeDomainTimeout","upgradeDomainProgressAtFailure":{"upgradeDomainName":"1","nodeProgressList":[{"node
Name":"_mstarsf10_1","upgradePhase":"PreUpgradeSafetyCheck","pendingSafetyChecks":[{"kind":"EnsureSeedNodeQu
orum"}]}]}}'.
Any ideas on what I can do about a "EnsureSeedNodeQuorum" error ?
The root cause was only 3 seed nodes in the cluster as a result of the cluster being build with a VM scale set that had "overprovision" set to true. Lesson learned, remember to set "overprovision" to false.
I ended up deleting the cluster and scale set and recreated using my stored ARM template.