Monitor Calico network policies behavior - kubernetes

How could I monitor the network policies behavior?
I have a k8s cluster with calico as SDN.
For example I create a network policy to deny traffic from a set of IPs.
I try to make some executions from those IPs and they fail.
Where can I see that that traffic is being rejected because a Network policy?
Thank you.

There is no such possibility by default, but you can try to follow this instruction to create a user interface that shows blocked and allowed connections in real time.
Also, Getting started with Calico could be useful.
You can find Calico logs in the /var/log/calico folder on the Calico pod.
More about logging please find here: Calico Logging.

Related

Is there a way to Prevent inter-namespace communication of pods in Kubernetes without using network policy

I am setting up hybrid cluster(master-centos and 2 worker nodes-windows 2019) with containerd as runtime. I cannot use any CNI like calico and weave as they need docker as runtime.I can use Flannel but it does not support network policies well. Is there a way to prevent inter-namespace communication of pods in Kubernetes WITHOUT using network policy?
Is there a way to prevent inter-namespace communication of pods in Kubernetes WITHOUT using network policy?
Network policies was create for that exact purpose and as per documents you need CNI that supports them. In other way they will be ignored.
Network policies are implemented by the network plugin.
To use network policies, you must be using a networking solution which
supports NetworkPolicy. Creating a NetworkPolicy resource without a
controller that implements it will have no effect.
If your only option is to use flannel for networking, you can install Calico network policy to secure cluster communications. So basically you are installing calico for policy and flannel for networking commonly known as Canal. You can find more details in calico docs
Here's also a good answer how to setup calico with containerd that you might find useful for your case.
As Flannel is L2 networking solution only thus no support for NetworkPolicy (L3/L4) you can implement security on the service level (any form of authorization like user/pass, certificate, saml, oauth etc.).
But without NetworkPolicy one will loose firewall like security which may not be what you want.

Azure Ingress TCP Forward Network Security Group

I have created an Ingress service that forwards TCP port 22 to a service in my cluster. As is, every inbound traffic is allowed.
What I would like to know is if it is possible to define NSG rules to prevent access to a certain subnet only. I was able to define that rule using the Azure interface. However, every time that Ingress service is edited, those Network Security Group rules get reverted.
Thanks!
I think there would be some misunderstanding about the NSG in AKS. So first let us take a look at the network of the AKS, Kubernetes uses Services to logically group a set of pods together and provide network connectivity. See the AKS Service for more details. And when you create services, the Azure platform automatically configures any network security group rules that are needed.
Don't manually configure network security group rules to filter
traffic for pods in an AKS cluster.
See NSG in AKS for more details. So in this situation, you do not need to manage the rule in the NSG manually.
But don't worry, you can also manage the rules for your pods manually as you want. See Secure traffic between pods using network policies in Azure Kubernetes Service. You can install the Calico network policy engine and create Kubernetes network policies to control the flow of traffic between pods in AKS. Although it just is the preview version, it also can help you with what you want. But remember, the Network policy can only be enabled when the cluster is created.
Yes! this is most definitely possible. The Azure NSG is for subnets and NIC's. You can define the CIDR on the NSG rule to allow/deny traffic on the desired port and apply it to the NIC and subnet. A word of caution would be to make sure to have matching rules at Subnet and NIC level if the cluster is within the same subnet. Else the traffic would be blocked internally and won't go out. This doc best describes them https://blogs.msdn.microsoft.com/igorpag/2016/05/14/azure-network-security-groups-nsg-best-practices-and-lessons-learned/.

DNS problem on AWS EKS when running in private subnets

I have an EKS cluster setup in a VPC. The worker nodes are launched in private subnets. I can successfully deploy pods and services.
However, I'm not able to perform DNS resolution from within the pods. (It works fine on the worker nodes, outside the container.)
Troubleshooting using https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/ results in the following from nslookup (timeout after a minute or so):
Server: 172.20.0.10
Address 1: 172.20.0.10
nslookup: can't resolve 'kubernetes.default'
When I launch the cluster in an all-public VPC, I don't have this problem. Am I missing any necessary steps for DNS resolution from within a private subnet?
Many thanks,
Daniel
I feel like I have to give this a proper answer because coming upon this question was the answer to 10 straight hours of debugging for me. As #Daniel said in his comment, the issue I found was with my ACL blocking outbound traffic on UDP port 53 which apparently kubernetes uses to resolve DNS records.
The process was especially confusing for me because one of my pods worked actually worked the entire time since (I think?) it happened to be in the same zone as the kubernetes DNS resolver.
To elaborate on the comment from #Daniel, you need:
an ingress rule for UDP port 53
an ingress rule for UDP on ephemeral ports (e.g. 1025–65535)
I hadn't added (2) and was seeing CoreDNS receiving requests and trying to respond, but the response wasn't getting back to the requester.
Some tips for others dealing with these kinds of issues, turn on CoreDNS logging by adding the log configuration to the configmap, which I was able to do with kubectl edit configmap -n kube-system coredns. See CoreDNS docs on this https://github.com/coredns/coredns/blob/master/README.md#examples This can help you figure out whether the issue is CoreDNS receiving queries or sending the response back.
I ran into this as well. I have multiple node groups, and each one was created from a CloudFormation template. The CloudFormation template created a security group for each node group that allowed the nodes in that group to communicate with each other.
The DNS error resulted from Pods running in separate node groups from the CoreDNS Pods, so the Pods were unable to reach CoreDNS (network communications were only permitted withing node groups). I will make a new CloudFormation template for the node security group so that all my node groups in my cluster can share the same security group.
I resolved the issue for now by allowing inbound UDP traffic on port 53 for each of my node group security groups.
So I been struggling for a couple of hours i think, lost track of time, with this issue as well.
Since i am using the default VPC but with the worker nodes inside the private subnet, it wasn't working.
I went through the amazon-vpc-cni-k8s and found the solution.
We have to sff the environment variable of the aws-node daemonset AWS_VPC_K8S_CNI_EXTERNALSNAT=true.
You can either get the new yaml and apply or just fix it through the dashboard. However for it to work you have to restart the worker node instance so the ip route tables are refreshed.
issue link is here
thankz
Re: AWS EKS Kube Cluster and Route53 internal/private Route53 queries from pods
Just wanted to post a note on what we needed to do to resolve our issues. Noting that YMMV and everyone has different environments and resolutions, etc.
Disclaimer:
We're using the community terraform eks module to deploy/manage vpcs and the eks clusters. We didn't need to modify any security groups. We are working with multiple clusters, regions, and VPC's.
ref:
Terraform EKS module
CoreDNS Changes:
We have a DNS relay for private internal, so we needed to modify coredns configmap and add in the dns-relay IP address
...
ec2.internal:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.dev.com:53 {
errors
cache 30
forward . 10.1.1.245
}
foo.stage.com:53 {
errors
cache 30
forward . 10.1.1.245
}
...
VPC DHCP option sets:
Update with the IP of the above relay server if applicable--requires regeneration of the option set as they cannot be modified.
Our DHCP options set looks like this:
["AmazonProvidedDNS", "10.1.1.245", "169.254.169.253"]
ref: AWS DHCP Option Sets
Route-53 Updates:
Associate every route53 zone with the VPC-ID that you need to associate it with (where our kube cluster resides and the pods will make queries from).
there is also a terraform module for that:
https://www.terraform.io/docs/providers/aws/r/route53_zone_association.html
We had run into a similar issue where DNS resolution times out on some of the pods, but re-creating the pod couple of times resolves the problem. Also its not every pod on a given node showing issues, only some pods.
It turned out to be due to a bug in version 1.5.4 of Amazon VPC CNI, more details here -- https://github.com/aws/amazon-vpc-cni-k8s/issues/641.
Quick solution is to revert to the recommended version 1.5.3 - https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html
As many others, I've been struggling with this bug a few hours.
In my case the issue was this bug https://github.com/awslabs/amazon-eks-ami/issues/636 that basically sets up an incorrect DNS when you specify endpoint and certificate but not certificate.
To confirm, check
That you have connectivity (NACL and security groups) allowing DNS on TCP and UDP. For me the better way was to ssh into the cluster and see if it resolves (nslookup). If it doesn't resolve (most likely it is either NACL or SG), but check that the DNS nameserver in the node is well configured.
If you can get name resolution in the node, but not inside the pod, check that the nameserver in /etc/resolv.conf points to an IP in your service network (if you see 172.20.0.10, your service network should be 172.20.0.0/24 or so)

Implementing iptables rules on Kubernetes nodes

I would like to implement my own iptables rules before Kubernetes (kube-proxy) start doing it's magic and dynamically create rules based on services/pods running on the node. The kube-proxy is running in --proxy-mode=iptables.
Whenever I tried to load rules when booting up the node, for example in the INPUT chain, the Kubernetes rules (KUBE-EXTERNAL-SERVICES and KUBE-FIREWALL) are inserted on top of the chain even though my rules were also with -I flag.
What am I missing or doing wrong?
If it is somehow related, I am using weave-net plugin for the pod network.
The most common practice is to put all custom firewall rules on the gateway(ADC) or into cloud security groups. The rest of the cluster security is implemented by other features, like Network Policy (It depends on the network providers), Ingress, RBAC and others.
Check out the articles about Securing a Cluster and Kubernetes Security - Best Practice Guide.
These articles can also be helpful to secure your cluster:
Hardening your cluster's security
The Ultimate Guide to Kubernetes Security

Any way to access Calico network by non-Calico nodes

I am very new to Calico and Calico networking, so far I went through the Calico docs.
My question is, is there any way to access Calico network by non-Calico nodes?
Went through all the docs, but haven't found any solution, am I missing something ?
If you check the documentation here https://docs.projectcalico.org/v2.6/usage/external-connectivity , you will find, it is mentioned there in Inbound connectivity part:-
BGP peering into your network infrastructure, or using orchestrator specific options..
But if you want to get simple connectivity, a better option is to run calico/node service and calicoctl command line tool can be used to launch calico/node container,
which is configured to connect to the datastore being used, on a non-calico node.
That will cause the routes to be distributed to the host and then it would be able to access the workloads.
Found similar ref: https://github.com/projectcalico/calico/issues/858
Hope this helps you