Clusters and nodes formation in Kubernetes - kubernetes

I am trying to deploy my Docker images using Kubernetes orchestration tools.When I am reading about Kubernetes, I am seeing documentation and many YouTube video tutorial of working with Kubernetes. In there I only found that creation of pods, services and creation of that .yml files. Here I have doubts and I am adding below section,
When I am using Kubernetes, how I can create clusters and nodes ?
Can I deploy my current docker-compose build image directly using pods only? Why I need to create services yml file?
I new to containerizing, Docker and Kubernetes world.

My favorite way to create clusters is kubespray because I find ansible very easy to read and troubleshoot, unlike more monolithic "run this binary" mechanisms for creating clusters. The kubespray repo has a vagrant configuration file, so you can even try out a full cluster on your local machine, to see what it will do "for real"
But with the popularity of kubernetes, I'd bet if you ask 5 people you'll get 10 answers to that question, so ultimately pick the one you find easiest to reason about, because almost without fail you will need to debug those mechanisms when something inevitably goes wrong
The short version, as Hitesh said, is "yes," but the long version is that one will need to be careful because local docker containers and kubernetes clusters are trying to solve different problems, and (as a general rule) one could not easily swap one in place of the other.
As for the second part of your question, a Service in kubernetes is designed to decouple the current provider of some networked functionality from the long-lived "promise" that such functionality will exist and work. That's because in kubernetes, the Pods (and Nodes, for that matter) are disposable and subject to termination at almost any time. It would be severely problematic if the consumer of a networked service needed to constantly update its IP address/ports/etc to account for the coming-and-going of Pods. This is actually the exact same problem that AWS's Elastic Load Balancers are trying to solve, and kubernetes will cheerfully provision an ELB to represent a Service if you indicate that is what you would like (and similar behavior for other cloud providers)
If you are not yet comfortable with containers and docker as concepts, then I would strongly recommend starting with those topics, and moving on to understanding how kubernetes interacts with those two things after you have a solid foundation. Else, a lot of the terminology -- and even the problems kubernetes is trying to solve -- may continue to seem opaque

Related

Off-Loading of k8s deployments to different cluster in case of high loads

Since I am unable to find anything on google or the official docs, I have a question.
I have a local minikube cluster with deployment, service and ingress, which is working fine. Now when the load on my local cluster becomes too high I want to automatically switch to a remote cluster.
Is this possible?
How would I achieve this?
Thank you in advance
EDIT:
A remote cluster in my case would be a rancher Kubernetes cluster, but as long as the resources on my local one are sufficient I want to stay there.
So lets say my local cluster has enough resources to run two replicas of my application, but when a third one is needed to distribute the load, it should be deployed to the remote rancher cluster. (I hope that is clearer now)
I imagine it would be doable with kubefed (https://github.com/kubernetes-sigs/kubefed) when using the ReplicaSchedulingPreferences (https://github.com/kubernetes-sigs/kubefed/blob/master/docs/userguide.md#replicaschedulingpreference) and just weighting the local cluster very high and the remote one very low and then setting spec.rebalance to true to distribute it in case of high loads, but that approach seems a bit like a workaround.
Your idea of using Kubefed sounds good but there is an another option: Multicluster-Scheduler.
Multicluster-scheduler is a system of Kubernetes controllers that
intelligently schedules workloads across clusters. It is simple to use
and simple to integrate with other tools.
To be able to make a better choice for your use case you can read through the Comparison with Kubefed (Federation v2).
All the necessary info can be found in the provided GitHub thread.
Please let me know if that helped.

Where is KOPS located/running from?

I am new to Docker and Kubernetes, though I have mostly figured out how it all works at this point.
I inherited an app that uses both, as well as KOPS.
One of the last things I am having trouble with is the KOPS setup. I know for absolute certain that Kubernetes is setup via KOPS. There's two KOPS state stores on an S3 bucket (corresponding to a dev and prod cluster respectively)
However while I can find the server that kubectl/kubernetes is running on, absolutely none of the servers I have access to seem to have a kops command.
Am I misunderstanding how KOPS works? Does it not do some sort of dynamic monitoring (would that just be done by ReplicaSet by itself?), but rather just sets a cluster running and it's done?
I can include my cluster.spec or config files, if they're helpful to anyone, but I can't really see how they're super relevant to this question.
I guess I'm just confused - as far as I can tell from my perspective, it looks like KOPS is run once, sets up a cluster, and is done. But then whenever one of my node or master servers goes down, it is self-healing. I would expect that of the node servers, but not the master servers.
This is all on AWS.
Sorry if this is a dumb question, I am just having trouble conceptually understanding what is going on here.
kops is a command line tool, you run it from your own machine (or a jumpbox) and it creates clusters for you, it’s not a long-running server itself. It’s like Terraform if you’re familiar with that, but tailored specifically to spinning up Kubernetes clusters.
kops creates nodes on AWS via autoscaling groups. It’s this construct (which is an AWS thing) that ensures your nodes come back to the desired number.
kops is used for managing Kubernetes clusters themselves, like creating them, scaling, updating, deleting. kubectl is used for managing container workloads that run on Kubernetes. You can create, scale, update, and delete your replica sets with that. How you run workloads on Kubernetes should have nothing to do with how/what tool you (or some cluster admin) use to manage the Kubernetes cluster itself. That is, unless you’re trying to change the “system components” of Kubernetes, like the Kubernetes API or kubedns, which are cluster-admin-level concerns but happen to run on top of Kuberentes as container workloads.
As for how pods get spun up when nodes go down, that’s what Kubernetes as a container orchestrator strives to do. You declare the desired state you want, and the Kubernetes system makes it so. If things crash or fail or disappear, Kubernetes aims to reconcile this difference between actual state and desired state, and schedules desired container workloads to run on available nodes to bring the actual state of the world back in line with your desired state. At a lower level, AWS does similar things — it creates VMs and keeps them running. If Amazon needs to take down a host for maintenance it will figure out how to run your VM (and attach volumes, etc.) elsewhere automatically.

Multiple pods using same database on kubernetes

I would like to know if it is possible for multiple pods in the same Kubernetes cluster to access a database which is configured using persistent volumes on a Google cloud persistent disk.
Currently I am building a microservices achitecture web app which has 3 node apis in different pods all accessing the same database. So how do I achieve this with kubernetes.
Kindly let me know if my architecture is right as well
You can certainly connect multiple node-based app pods to the same database. It is sometimes said that microservices shouldn't share a database but this depends on what your apps are doing, the project history and the extent to which you want the parts to be worked on separately.
There are questions you have to answer about running databases at scale, such as your future load and whether you want to use relational databases if you're going to try to span availability zones. And there are
some specific to kubernetes, especially around how you associate DB Pods to data. See https://stackoverflow.com/a/53980021/9705485. Another popular option is to use a managed DB service from a cloud provider. If you do run the DB in k8s then I'd suggest looking for a helm chart or looking at an operator, such as the kubeDB operator, to avoid crafting the kubernetes descriptors yourself and to get more guidance on running the DB and setting it up.
If it's a new project and you've not used k8s before then you'll also have to decide where to host your code, your docker images and your deployment descriptors and how to setup your CI pipelines. If you've not got answers to these questions already then I'd suggest looking at Jenkins-X as it will provide you with out of the box defaults for a whole cluster and CI setup and a template ('build pack') for building node apps and deploying them to staging and prod environments through a pipeline.

Will the master know the data on workers/nodes in k8s

I try to deploy a set of k8s on the cloud, there are two options:the masters are in trust to the cloud provider or maintained by myself.
so i wonder about that if the masters in trust will leak the data on workers?
Shortly, will the master know the data on workers/nodes?
The abstractions in Kubernetes are very well defined with clear boundaries. You have to understand the concept of Volumes first. As defined here,
A Kubernetes volume is essentially a directory accessible to all
containers running in a pod. In contrast to the container-local
filesystem, the data in volumes is preserved across container
restarts.
Volumes are attached to the containers in a pod and There are several types of volumes
You can see the layers of abstraction source
Master to Cluster communication
There are two primary communication paths from the master (apiserver) to the cluster. The first is from the apiserver to the kubelet process which runs on each node in the cluster. The second is from the apiserver to any node, pod, or service through the apiserver’s proxy functionality.
Also, you should check the CCM - The cloud controller manager (CCM) concept (not to be confused with the binary) was originally created to allow cloud specific vendor code and the Kubernetes core to evolve independent of one another. The cloud controller manager runs alongside other master components such as the Kubernetes controller manager, the API server, and scheduler. It can also be started as a Kubernetes addon, in which case it runs on top of Kubernetes.
Hope this answers all your questions related to Master accessing the data on Workers.
If you are still looking for more secure ways, check 11 Ways (Not) to Get Hacked
Short answer: yes the control plane can access all of your data.
Longer and more realistic answer: probably don't worry about it. It is far more likely that any successful attack against the control plane would be just as successful as if you were running it yourself. The exact internal details of GKE/AKS/EKS are a bit fuzzy, but all three providers have a lot of experience running multi-tenant systems and it wouldn't be negligent to trust that they have enough protections in place against lateral escalations between tenants on the control plane.

Advantages to containerized Kubernetes master processes

Both the Kubernetes HA guide and the From Scratch guide recommend running Etcd, kube-apiserver, kube-controller-manager, and kube-scheduler in containers. The idea of self-hosting Kubernetes on Kubernetes goes back quite a while (see PR 167 on K8s github and issues/PRs linked there), but I haven't found a discussion about why this approach is so beneficial that it should be the 'recommended' way. Here are the benefits and drawbacks as I see them currently:
Benefits:
Potentially easy upgrade path to just update manifests and have kubelet pull new images.
"Container advantages": binary environment and the host environment separate, leverage others' existing images, etc.
Follows the whole Kubernetes pattern, so 'fits the brain' once you are using that pattern extensively.
Drawbacks:
Increased installation/configuration complexity in some cases. For example, if your Etcd cluster is separate from your Kubernetes nodes, you now have to install Docker (with possible storage changes depending on Linux distro), kubelet, and Etcd. Without using containerized Etcd, you just have that one binary to install.
Increased complexity at run time: With more moving parts, any bug in Docker or kubelet may be able to render critical components non-functional.
I'm new to Kubernetes (and containers) and feel like I might be missing advantages (or underestimating their value) when compared to the extra complexity it introduces. But I also have to choose once way to try. Why are containerized master components the recommended way to run Kubernetes despite the extra complexity?
The biggest benefit is streamlined setup for most people. Running a few docker run commands is way easier than downloading binaries, unpacking, fine-tuning init scripts (which are different on every distro), running a supervisor, etc. We have a pretty good process manager - relying on that is powerful.
We also don't recommend sharing etcd, so if you're doing that you are already off the beaten path.
Overall, containerized components are vastly simpler than the alternatives for most people.