How can we optimise performance of a single-node Cassandra cluster for short-lived testing? - kubernetes

We are running our end-to-end tests using a single Cassandra node running on k8s, this node gets quite a lot reads and writes, note that this node is deleted once tests have finished, so there is no need to consider long term maintenance of data etc. what optimisations would you recommend to configure in this use case to reduce overhead?
Disabling auto compaction had came in my mind... anything else?

So there are always a few things that I do when building up a single node for development or testing. My goals are more about creating something which matches the conditions in production, as opposed to reducing overhead. Here's my list:
Rename the cluster to something other than "Test Cluster."
Set the snitch to GossipingPropertyFileSnitch.
Enable both the PasswordAuthenticator and the CassandraAuthorizer.
If you use client or node to node SSL, you'll want to enable that, too.
Provide non-default values for the dc and rack names in the cassandra-rackdc.properties file.
Create all keyspaces using NetworkTopologyStrategy and the dc name from the previous step.
Again, I wouldn't build an unsecured node with SimpleStrategy keyspaces in production. So I don't test that way, either.
With building a new single node cluster each time, I can't imagine much overhead getting in your way. I don't think that you can fully disable compaction, but you can reduce the compaction throughput (YAML) down to the point where it will consume almost no resources:
compaction_throughput: 1MiB/s
It might be easiest to set that in the YAML, but you can also do this from the command line:
nodetool setcompactionthroughput 1
I'd also have a look at the GC settings, and try to match what you have in production as well. But for the least amount of overhead with the least config, I'd go with G1GC.

Related

Is it possible to run a single container Flink cluster in Kubernetes with high-availability, checkpointing, and savepointing?

I am currently running a Flink session cluster (Kubernetes, 1 JobManager, 1 TaskManager, Zookeeper, S3) in which multiple jobs run.
As we are working on adding more jobs, we are looking to improve our deployment and cluster management strategies. We are considering migrating to using job clusters, however there is reservation about the number of containers which will be spawned. One container per job is not an issue, but two containers (1 JM and 1 TM) per job raises concerns about memory consumption. Several of the jobs need high-availability and the ability to use checkpoints and restore from/take savepoints as they aggregate events over a window.
From my reading of the documentation and spending time on Google, I haven't found anything that seems to state whether or not what is being considered is really possible.
Is it possible to do any of these three things:
run both the JobManager and TaskManager as separate processes in the same container and have that serve as the Flink cluster, or
run the JobManager and TaskManager as literally the same process, or
run the job as a standalone JAR with the ability to recover from/take checkpoints and the ability to take a savepoint and restore from that savepoint?
(If anyone has any better ideas, I'm all ears.)
One of the responsibilities of the job manager is to monitor the task manager(s), and initiate restarts when failures have occurred. That works nicely in containerized environments when the JM and TMs are in separate containers; otherwise it seems like you're asking for trouble. Keeping the TMs separate also makes sense if you are ever going to scale up, though that may moot in your case.
What might be workable, though, would be to run the job using a LocalExecutionEnvironment (so that everything is in one process -- this is sometimes called a Flink minicluster). This path strikes me as feasible, if you're willing to work at it, but I can't recommend it. You'll have to somehow keep track of the checkpoints, and arrange for the container to be restarted from a checkpoint when things fail. And there are other things that may not work very well -- see this question for details. The LocalExecutionEnvironment wasn't designed with production deployments in mind.
What I'd suggest you explore instead is to see how far you can go toward making the standard, separate container solution affordable. For starters, you should be able to run the JM with minimal resources, since it doesn't have much to do.
Check this operator which automates the lifecycle of deploying and managing Flink in Kubernetes. The project is in beta but you can still get some idea about how to do it or directly use this operator if it fits your requirement. Here Job Manager and Task manager is separate kubernetes deployment.

How do I setup a Active / Passive environment with two nodes in OpenShift?

I am trying to configure a Active/Passive cluster with two nodes (using OpenShift). The second passive node should be a hot standby, in other words it is up and running but not doing anything, until the first node dies. Then the passive node becomes active and a new passive node is started.
I have read the High Availability documentation, however it just seems to cover the theory. Furthermore it seems like overkill ( I am thinking there might be an easier way to meet my goal).
Where would I start?
What you are asking for goes against the usual practice for how Kubernetes/OpenShift is used. You wouldn't have hot standby nodes, you would always use all nodes in the cluster. You would then allow for enough additional capacity in your cluster such that loosing a node doesn't cause a problem as other nodes would have enough capacity to then run the applications. In this scenario the Kubernetes scheduler would automatically restart any applications which were on a failed node on the other nodes in the cluster, without you needing to perform any explicit failover steps.
So don't try and do anything special, setup your cluster with the two nodes, with applications being distributed across both. If you need to have the ability to run with only a single node, make sure it has enough capacity to run everything. If over time you add more applications and one node is not enough, add a third node, with all three being used in normal case. You can then handle failure of a single node again.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!

MongoDB single server production setup

I am developing a server to a customer who has only one machine for his production deployment.
It's a CentOS 64bit with 8Gb of memory.
I am using Mongo and the question is, do I still need to deploy a replica set even though it's a single machine?
Will I get the advantages of a replica set or since it's a single machine it really does not matter and journaling is enough?
You definitely have to enable journaling (It will ensure consistent state even in cases of HW failure scenarios, you will not have to run costy repair command after a crash). You should enable RAID under the data directrory (Anyway this is in general recommended), while here will be crucial not to lose data due to a disk failure (You do not have copy on an other box or so). There is no option for HA within one box it is quite straightforward, however it is not harmful, and in some cases useful to configure 1 node (1 mongod) replicaset (Than you will have oplog). This will help for example when you likely to have MMS backup, or just for enable point in time backup feature of mongodump. Later if you will likely to scale out for HA this way you will only have to add the new nodes to your initially established replicaset.
Make no sense to run several replicas inside one box, while they will race on HW resources and will bring nothing as an advantage.

Is there a way to make jobs in Jenkins mutually exclusive?

I have a few jobs in Jenkins that use Selenium to modify a database through a website's front end. If some of these jobs run at the same time, errors due to dirty reads can result. Is there a way to force certain jobs in Jenkins to be unable to run at the same time? I would prefer not to have to place or pick up a lock on the database, which could be read or modified by any number of users who are also testing.
You want the Throttle Concurrent Builds plugin which lets you define global and per-node semaphores.
Locks and latches is being deprecated in favor of Throttle Concurrent builds.
I've tried both the locks & latches plugin and the port allocator plugin as ways to achieve what you're trying to do. Neither worked reliably for me. Locks & latches worked some of the time, but I'd occasionally get hung jobs. Using port allocator as a hack will work unless you have multiple jenkins nodes, but the config overhead is kind of high. What I've ultimately settled upon is another hack, but it works reliably and uses core Jenkins stuff (no plugins):
create a slave node running on the same box as the master (or not, if you have lots of boxes)
give this slave a single executor (key)
tie the 2 (or n) jobs that must not run together to this new slave node
optionally set the slave's usage to 'tied jobs only' if it'll screw up your other jobs if they happen to run on the new slave
Since the slave has only one executor, the jobs tied to it can never run together.
If you regard the database as a shared resource that can only be used exclusively then this fits the usecase of the Lockable resources plugin.
It is being actively developed and improved and is very versatile.