Managing ROCKS cluster - centos

I suddenly became an admin of the cluster in my lab and I'm lost.
I have experiences managing linux servers but not clusters.
Cluster seems to be quite different.
I figured the cluster is running CentOS and ROCKS.
I'm not sure what SGE and if it is used in the cluster or not.
Would you point me to an overview or documents of how cluster is configured and how to manage it? I googled but there seem to be lots of way to build a cluster and it is confusing where to start.

I too suddenly became a Rocks Clusters admin. While your CentOS knowledge will be handy, there are some 'Rocks' way of doing things, which you need to read up on. They mostly start with the CLI command rocks list|set command, and they are very nice to work with, when you get to learn them.
You should probadly start by reading the documentation (for the newest version, you can find yours with 'rocks report version'):
http://central6.rocksclusters.org/roll-documentation/base/6.1/
You can read up on SGE part at
http://central6.rocksclusters.org/roll-documentation/sge/6.1/
I would recommend you to sign up for the Rokcs Clusters discussion mailing list on:
https://lists.sdsc.edu/mailman/listinfo/npaci-rocks-discussion
The list is very friendly.

Related

How to avoid congestion when using Kubernetes pods as Jenkins slaves

Our usecase is pretty simple, however, I haven't found a solution for it yet.
In the organization I'm working at, we decided to move to Kubernetes as our container manager in order to spin-up slaves.
Until we moved to this kind of environment, we used to have dedicated slaves per each team. Each got the resources it needs and based on that, it was working.
However, when we moved to use Kubernetes, it started to cause issues as each team shares the same pile of resources, which, can lead to congestion or job failures.
The suggested solution was to create Kubernetes cluster per each team, however, this will lead to burnout of the teams involved with maintanance of multiple clusters.
Searching online, I didn't found any solution avilable, hence, I'm asking here - what is the best way to approach the solution? I understand that we might need to implament a dispacher, but currently it's not possible in the way the Kubernetes plugin is developed.
Thanks,

Two versions of fluentd fighting over port in my cluster

Somehow, I have 2 versions of fluentd running in my cluster:
They end up fighting over the same port, they just keep cranking away, trying to start up on that port, and it saturates all the CPU in the cluster.
unexpected error error_class=Errno::EADDRINUSE error="Address already in use - bind(2) for 0.0.0.0:24231
/opt/google-fluentd/embedded/lib/ruby/2.6.0/socket.rb:201:in 'bind'
I've tried deleting the daemon sets and deployments, they just keep coming back. Also tried ssh'ing into the machines and killing the process on that port. Nothing seems to work.
Obviously, I only want one version of fluentd to run (and I'm not even sure which one).
I seem to have fixed it. I went to GCP dashboard cluster edit page, Kubernetes Engine Monitoring dropdown was blank. It seems not even the dropdown could decide what to display here.
It seems the automated agent, or whatever, seriously messed up here, and had 2 versions of the logging and monitoring system running, fighting over a port, and crushing the CPU on every machine in the cluster. On top of that, I couldn't delete the daemon sets, pods, or deployments. It seems Google treats these as special somehow, maybe with some kind of automated agent, I don't know.
From the dropdown, I just selected System and workload logging and monitoring, saved, and it applied the changes.
Everything looking good so far, but this whole event has me worried, I didn't do anything. This just....happened.
This is a dev cluster, but if it was a production cluster...

Akka cluster and OpenShift

I'm new to Akka Clusters, however as I am understanding its documentation, I need to know at least one "seed node" to join an existing cluster.
So when using clusters with OpenShift I would need to know if the current gear is the first node - then I would create a new cluster - or if there are already some other gears around - I would need to know at least one of their IPs to join them.
Is this possible with OpenShift cloud? (I'm using the DIY catridge, so customizing the start up script wouldn't be a problem. However I can't find any environment variable which provides me relevant data.)
DIY gears on OpenShift Online do not scale. And if you are spinning up separate applications for each of the nodes in your cluster, you are going to (probably) run into inter-gear communication issues. You might need to create your own akka cartridge (http://docs.openshift.org/origin-m4/oo_cartridge_developers_guide.html), then you can set your own scaling options. You might check out this cartridge (https://github.com/smarterclayton/openshift-redis-cart) which supports scaling and might give you some ideas about how to implement yours.

Run MongoDB in Azure

How to run MongoDB on WindowsAzure? Should instance be deployed on a virtual machine? Are there any out-of-the-box solutions like images for virtual machines or anything else? How to run replica sets on WindowsAzure?
I saw this article http://docs.mongodb.org/ecosystem/platforms/windows-azure/ but I feel like it is already out of date. Is it?
Any best practices, help or info would be appreciated!
The article that you refer to describes the options quite well. You have three options:
Running MongoDB in worker roles (as linked to in the article). Before Azure VMs, worker roles were the only option, but I wouldn't recommend it.
You can try the MongoDB database as as service offerings that are available in the add-ons store. This would be a good way to try it out. For longer term, you will have to ask around for peoples' experience.
I recommend that you run MongoDB on a Linux VM. That way you have full control and support from the linux/MongoDB community. Replica sets would the be 'out the box'. The article links to a walkthrough on a CentOS image. You can also get a pre-built image from VMDepot such as this Ubuntu one. The VMDepot images seem to work very well and are a good start for people with less Linux experience.
Edit: MongoLab seems to be gaining traction, and is getting support from Scott Guthrie. As a service that has affinity with Azure datacentres, it is worth evaluating.
You can use MongoLab - Here goes the Tutorial on Azure
Using MongoLab all the maintenance (atleast in DB engine itself) will be taken care by MongoLab guys. That will remove lot of maintenance overheads on your side.

Learning NOSQL databases using a single machine?

In relational databases I would just pop in W3Schools tutorial, install mysql in my machine and start practicing! How can I learn non relational databases in a similar way? In most tutorials I read that these databases work with multiple nodes and data centers.
Does this mean that I will be unable to learn and practice, say Cassandra, using my own single pc?
You do it just like you do it with mySQL: You set up a database on your local machine and start experimenting.
Most database systems which focus on sharding and clustering also work as a stand-alone instance. But when you want to test these features specifically, you can often run multiple instances on the same machine. When you also want to try how they behave when they run on different machines, you can use a virtualization software like VMWare or VirtualBox to set up a bunch of virtual machines and build your virtual datacenter on your desktop.
(I would recommend VMWare for business use and VirtualBox for home use)
I'm a big fan of MongoDB. It's the NoSQL equivalent of MySQL.
Go to the Try It Out link on their home page and you can actually use it in a live session on their website - no download, no configuration, no hassle! Just use it and learn the basics.
Here's the quick start for Cassandra. http://wiki.apache.org/cassandra/GettingStarted
I don't see any reason you couldnt run that from local host. I think the point is that you Can scale these nosql solutions. Might want to check out mongodb or couchdb as well. Easy set up and both are great nosql solutions in my experience.
I would strongly suggest using something like Amazon EC2 for testing NoSQL solutions. You can definitely install a technology like MongoDB locally and create a replica set, but you should definitely put these on different physical machines if you can.
I have installed things like AppFabric, Couchbase and Mongo locally and created clusters and they always work really well locally. It's very easy because the networking part of it always goes smoothly.
Once you introduce two physical machines and a stronger network partition things get difficult.
You can create instances on EC2 for free last I checked if you use their Micro instances. You'll learn a lot.