Running Kubernetes locally - kubernetes

I am planning to test Kubernetes locally, but would like to ask some theoretic questions before.
I created a pipeline in python that takes as input a whole bunch of files from a directory, and created a docker image out of it (this is my Pod)
What I understood from the documentation is that the Kubernetes scheduler will choose automatically the minion to deploy for a given task, my question is, using an 8G memory laptop, is there a 'rule' to follow before creating the minion (specifying the number of minions to deploy) based on the amount of memory available in a machine (regardless if it is a laptop or a cluster) ?
Thanks

You would typically only ever have one minion/host. So if you are deploying your minions on physical hardware, there is a 1:1 mapping between minions and physical hosts.
If you are deploying into a virtual cluster on your laptop, you will want to make sure that each virtual minion has enough memory to run at least a single instance of whatever containers you plan on deploying. "How much is enough?" is a question that only you can answer.

Related

Tomcat in k8s pod and db in cloud - slow connection

I have tomcat, zookeeper and kafka deployled in local k8s(kind) cluster. The database is remote i.e. in cloud. The pages load very slowly.
But when i moved tomcat outside of the pod and started manually with zk and kafka in local k8s cluster and db in remote cloud the pages are loading fine.
Why is Tomcat very slow when inside a Kubernetes pod?
In theory, a program running in a container can run as fast as a program running on the host machine.
In practice, there are many things that can affect the performance.
When running on Windows or macOS (for instance with Docker Desktop), container doesn't run directly on the machine, but in a small Linux virtual machine. This VM will add a bit of overhead, and it might not have as much CPU and RAM as the host environment. One way to have a look at the resource usage of containers is to use docker stats; or docker run -ti --pid host alpine and then use classic UNIX tools like free, top, vmstat, ... to see the resource usage in the VM.
In most environments (at least with Docker, and with Kubernetes clusters in their most common default configurations), containers run without resource constraints and limits. However, it is fairly common (and, in fact, highly recommended!) to set resource requests and limits when running containers on Kubernetes. You can check resource limits of a pod with kubectl describe. If metrics-server is installed (which is recommended, even on dev/staging environments), you can check resource usage with kubectl top. Tools like k9s will show you resource requests, limits, and usage in a comprehensive way (as long as the data is available; i.e. you still need to install metrics-server to obtain pod metrics, for instance).
In addition to the VM overhead described above, if the container does a lot of I/O (whether it's disk or network), there might be a bit of overhead in comparison to a native process. This can become noticeable if the container writes on the container copy-on-write filesystem (instead of a volume), especially when using the device-mapper storage driver.
Applications that use "live reload" techniques (that automatically rebuild or restart when source code is edited) are particularly prone to this I/O issue, because there are unfortunately no efficient methods to watch file modifications across a virtual machine boundary. This means that many web frameworks exhibit extreme performance degradations when running in containers on Mac or Windows when the source code is mounted to the container.
In addition to these factors, there can be other subtle differences that might affect the overall performance of a containerized application. When observing performance issues, it is very helpful to use a profiler (or some kind of APM solution) to see which parts of the code take longer to execute. If no profiler or APM is available, try to execute individual portions of the code independently to compare their performance. For instance, have a small piece of code that executes a single query to the database; or executes a single task from a job queue, etc.
Good luck!

How can I fix ceph commands hanging after a reboot?

I'm pretty new to Ceph, so I've included all my steps I used to set up my cluster since I'm not sure what is or is not useful information to fix my problem.
I have 4 CentOS 8 VMs in VirtualBox set up to teach myself how to bring up Ceph. 1 is a client and 3 are Ceph monitors. Each ceph node has 6 8Gb drives. Once I learned how the networking worked, it was pretty easy.
I set each VM to have a NAT (for downloading packages) and an internal network that I called "ceph-public". This network would be accessed by each VM on the 10.19.10.0/24 subnet. I then copied the ssh keys from each VM to every other VM.
I followed this documentation to install cephadm, bootstrap my first monitor, and added the other two nodes as hosts. Then I added all available devices as OSDs, created my pools, then created my images, then copied my /etc/ceph folder from the bootstrapped node to my client node. On the client, I ran rbd map mypool/myimage to mount the image as a block device, then used mkfs to create a filesystem on it, and I was able to write data and see the IO from the bootstrapped node. All was well.
Then, as a test, I shutdown and restarted the bootstrapped node. When it came back up, I ran ceph status but it just hung with no output. Every single ceph and rbd command now hangs and I have no idea how to recover or properly reset or fix my cluster.
Has anyone ever had the ceph command hang on their cluster, and what did you do to solve it?
Let me share a similar experience. I also tried some time ago to perform some tests on Ceph (mimic i think) an my VMs on my VirtualBox acted very strange, nothing comparing with actual bare metal servers so please bare this in mind... the tests are not quite relevant.
As regarding your problem, try to see the following:
have at least 3 monitors (or an even number). It's possible that hang is because of monitor election.
make sure the networking part is OK (separated VLANs for ceph servers and clients)
DNS is resolving OK. (you have added the servername in hosts)
...just my 2 cents...

Deploy two services on one machine

I have a 4c8g machine. I deployed the same service to two ports. The maximum memory of each service jvm is 3g.
Someone suggested that I would only deploy one instance on a machine.
Can someone tell me what are the advantages and disadvantages of these two methods?
Thank you.
There are two advantages I see to running multiple instances in the way you describe:
You are paying for an 8Gb machine, of which you're only using 3Gb. Running a second instance brings that up to 6Gb which is a much more effective use of the resources you have.
By running two instances of the same service on a single machine you can take advantage of class sharing (and application class data sharing if you use the latest JVM) to improve the resource utilisation of your services further.

Azure Service Fabric deployments consume a lot disk space

I operate an on-premise Azure Service Fabric cluster for testing purposes. It consists of three nodes, which are running on a single virtual machine (Windows Server 2012) with a 50 GB disk attached to it.
Further I set up continuous deployment from TFS release pipeline to the cluster. However after approx. 80 deployments, service fabric consumed all available disk space and further deployments fail.
Most of the space is taken by C:\ProgramData\SF\Data, which took around 28GB, while each code package has a size of ~130 MB. After I have unprovisioned many of the old deployments (manually via SF portal), only around 5GB were released. Many of the old files are still around in C:\ProgramData\SF\Data.
What is the best approach to improve this?
Why are the files from the old deployments still on disk after unprovisioning?
Is it possible to delete these files manually?
Is it possible to automate the deprovisioning?
On a production environment this situation should be relaxed anyhow (since there is only one node per machine and bigger disks). Nevertheless this would only put off the evil day. I would feel safer to avoid this situation at all.
Edit
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
It seems that SF is deleting the deployment packages with some delay. I checked the test cluster after one day, and all unprovisioned packages vanished finally.
Further I found the Unregister-ServiceFabricApplicationType Cmdlet to automate the unprovisioning process (https://msdn.microsoft.com/en-us/library/mt125885.aspx).

What is called a Node in a WebSpere Network Deployment

In a installation of WebSphere Application Server with Network Deployment, a node is:
a physical machine
an instance of operative system
a logical set of WAS instances that is independent of physical machine or OS instance
Basically,
A server is a runtime environment, a process of execution.
A node is a grouping of servers that share common configuration. It is a physical machine.
A cell is a grouping of nodes into a sigle administrative domain. For websphere, it mean that if you group several servers within a cell, then you can administer them with one Websphere admin console
Hope this helps!
#ggasp Here is what I got off IBM's Information Center
A node is a logical grouping of managed servers.
A node usually corresponds to a logical or physical computer system with a distinct IP host address. Nodes cannot span multiple computers.
http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.nd.multiplatform.doc/info/ae/ae/cagt_node.html
Keep in mind that usually <> always.
Since WAS 6.0 and up you usually want to setup more than one node in each physical computer, given the usual power of the server you use the node to separate logical business entities.
Like for example have 6 nodes, 3 in each of 2 machines and have 1 pair of nodes you could define 3 different clusters one for each stage (dev, qa, staging) and making each cluster be invisible to the other.
A Cell is a virtual unit that is built of a Deployment Manager and one or more Nodes. A Node is another virtual unit that is built of a Node Agent and one or more Server instances.
Here you can find more details including a diagram.