Setting up a highly available Gitlab server - server

I need to setup a highly available Gitlab server on a bare metal, also i would know the best practices ( includes Security, Networking, authorizations, firewall ... etc.) to make the job done.

Configuring GitLab to be highly available is a complex process. Even a minimal scaled environment consists of a minimum of 5 distinct nodes/servers. Something much closer to actual high availability can require 11 nodes. See High Availability documentation for more information.
Please note that GitLab EE Omnibus also include some Premium/Ultimate-only features that make HA much easier - bundled Redis Seninel, Consul, PgBouncer, RepMgr, etc. This is in addition to access to the Support team for HA setup and configuration assistance.
Not that I'm trying to sell you on GitLab EE. But that may help to illustrate that this is a complex topic. If you truly need HA GitLab then it's probably a really critical part of your organization and that's why GitLab provides those features and services to Premium/Ultimate customers.
That said, HA GitLab can be achieved with GitLab CE/Core. But you will have to know how to configure each component including PostgreSQL replication, Redis/Redis Sentinel, load balancers, etc.

Related

Web application deployment approach using Google Cloud - GKE

I deploying a python + tensorflow + flask application using a fully managed Google Cloud Run Service (1 vCPUs and 4 GB Ram).
System works fine but it is really slow, so I am evaluating ways of making it fast (it needs to run 20-30 times faster than what is doing now)
What would be the best approach?
To use a Kubernetes Cluster with one or two powerful machines
To use a Kubernetes Cluster with 3-5 weaker machines
To forget about Kubernets/Docker and run everything on single powerfull VM
Something else maybe?
For now I don't expect to have more than 10 users at a time but I want to be able to scale it up eventually.
You might want to evaluate according to your use case
Per this article, Fully managed Cloud Run is an ideal serverless platform for stateless containerized microservices that don’t require Kubernetes features like namespaces, co-location of containers in pods (sidecars) or node allocation and management.
GKE is a great choice if you are looking for a container orchestration platform that offers advanced scalability and configuration flexibility.
You mentioned you are looking the cheaper/easier method to develop, but this will probably not be as scalable, efficient or manageable, you might want to take a closer look at all cloud compute options in GCP to see what could benefit your use case the most.
You mentioned your use case is CPU intensive, so you might want to leverage the high CPU machine types, these might be used directly by creating a VM, creating an instance group or using them in other services like GKE or App Engine

Best way to deploy MongoDB to Google Cloud Platform?

Been working on a web app with a simple database model that only needs CRUD operations, figured MongoDB would be perfect for it. The most important constraints of the project is that it be able to scale from a small amount of users to a large amount. I’ve been looking at the cloud launcher and I’ve noticed that the most popular MongoDB solution advertises a cost of ~$350/mo. This is a surprisingly large amount that makes me consider using cloud sql for my database instead. Is there a better way to deploy MongoDB to GCP that’s more fitted to my use case? I’ve been reading about automatic scaling with kubernetes but I can’t find anything about price. Any and all advice is greatly appreciated
I haven't used mongodb with kubernetes but we do use the cloud launcher solution at work. We use 2 nodes(n1-standard-1) and an arbiter(micro) + 100GB storage on each node which comes up around $100 a month. You would need a replicaset in a production environment so this seems to be a reasonable base cost.
Kubernetes does not provide a lot of advantages over the classic GCE deployment for mongodb compared to a webserver. Setting up a replicaset on kubernetes is a bit more work compared to GCE setup. https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474 and http://blog.kubernetes.io/2017/01/running-mongodb-on-kubernetes-with-statefulsets.html should serve as decent references but wouldn't lower your costs. Scaling nodes would be slightly easier though but does not strictly translate to scaling mongodb.
I have lately been working on a similar solution.
GCP announced that they don't charge for Kubernetes cluster management but only for resources used by it (instances, network ...):
https://cloud.google.com/kubernetes-engine/pricing
In general, databases are high maintenance (data mounts, backups, migrations...), so I would not start running Mongo on Kubernetes right away. You could get there but it will be more complicated than deploying your web app on Kubernetes.
Better to use MongoDB as a service that supports GCP (e.g. MongoDB Atlas), I have done so myself and see a few other companies do that.
If you scale gradually you should be able to control your costs.
The web app itself should be easy to deploy and maintain on Kubernetes.

Best way to setup ELK in production

Is it a good practice to setup Elasticsearch, logstash and kiban on 3 different servers, with each server having RAM of 8GB.
Or
Setup ELK on 1 single machine with higher memory of 16GB.
The machine needs to be highly available.
Can anyone suggest or share inputs
it depends on your task and situation. normally it is good practice to setup Elasticsearch, logstash and kiban on 3 different server. or if you data if more so you have to make a cluster of elastic search or may have more than one server of logstash .
filefeats will be on all the data(log) server .
there are an example of handling 25000 logs per secoung
https://engineering.viki.com/blog/2015/log-processing-at-scale-elk-cluster-at-25k-events-per-second/
Its slightly more complicated than explained here,
Any distributed component would try to offer features with sharded or partioned way. In a similar way the Elastic Search at ELK which is based out of Master Slave model and maintains the data at ES data nodes. This means one needs to set up a cluster of nodes for Elastic search itself for its various components such as ES Master, ES data and ES client.
The next level if the system is required at production grade which requires Multi master setup with minimum 3 master nodes.
This would be the beginning of ELK.
If one needs to run such a complex system in a limited resources, then Containerizing the ELK components and running them in a container orchestration framework is the recommended option. Kubernetes/Docker swarm are the options to run ELK cluster based on the dockerized instances of ELK. Again these orchestration frameworks also require multimaster setup , but that would be fair as one would have lot more components in a cloud environment and all of them could be controlled under these orchestration frameworks.

difference between dcos-kafka-service and mesos-kafka

I’m doing a POC to deploy Kafka as an application on Mesos Cluster. I came across these 2 codebases on github. One developed by apache-mesos (github page) & other developed by mesosphere and can run only on DCOS (github page).
Question: Would like to know if there are any differences between DCOS-Kafka & mesos-Kafka in terms of features and extended functionality.
Regarding Mesos-Kafka:
I don’t see active participation on github (and some open issues) for mesos-kafka in the past months. Can I assume that the service is robust enough that I can use in production environment? Any Inputs on this would be helpful.
kakfa-mesos is a package that includes a release of Kafka and a custom mesos scheduler that was meant to work around issues with running Kafka as a stateful service on Marathon. I think post but confluent is useful. It also includes a RESTful api for doing ops tasks and aims to include these features in the future (this is pulled from the article I linked)
Integrating Kafka commands (e.g. kafka-topics, etc) into the Scheduler so it can be used through the CLI and REST API.
Auto-scaling clusters (including auto reassignment of partitions) so that the resources (CPU, RAM, etc.) that brokers are using can be used elsewhere in known valleys of traffic.
Rack-aware partition assignment for fault tolerance.
Hooks so that producers and consumers can also be launched from the Scheduler and managed with the cluster.
Automated partition reassignment based on load and traffic
I haven't used it in a production environment myself but it has the support of Confluent which is a good sign.
DC/OS Kafka on the other hand is a DC/OS service which will probably only be useful if you are already running or plan on running services through Mesosphere's DC/OS. It also includes an API and a CLI management tool but is less ambitious with additional features. It's current feature set includes
Single-command installation for rapid provisioning
Multiple clusters for multiple tenancy with DC/OS
High availability runtime configuration and software updates
Storage volumes for enhanced data durability, known as Mesos Dynamic * * Reservations and Persistent Volumes
Integration with syslog-compatible logging services for diagnostics and troubleshooting
Integration with statsd-compatible metrics services for capacity and performance monitoring

Apache Mesos vs Google Kubernetes

What's the difference between Apache's Mesos and Google's Kubernetes
I read the accepted answers but I'm still confused what the differences are.
If Kubernetes is a cluster management then what does Mesos do (I understand what it does from watching bunch of videos but I suppose I'm more confused how those two work together)?
From reading both Kubernetes and Marathon are "framework" sitting on top of Mesos?
What is Mesos responsible for and what are Kubernetes/Marathon responsible for and how do they work with each other?
EDIT:
I think the better question is When would I want to use Kubernetes on top of Mesos vs just running Mesos alone?
Mesos is another abstraction layer. It simply abstracts underlying hardware so the software that want to run on the top of it could only define required resources without having to know any other information.
Kubernetes could do similar thing but without abstraction provided by Mesos you can't run other frameworks (e.g., Spark or Cassandra) on same machine without manually dividing it between those frameworks.
Apache Mesos is a resource manager that shares resources (CPU shares, RAM, disk, ports) across a cluster of machines in a fair way. By sharing, I mean it offers these resources to so called framework schedulers (such as Marathon) and thereby has a clear separation of concerns in terms of resource management and scheduling decisions (which is implemented, depending on the job type, for example long-running or batch, by the framework scheduler). See also the Mesos architecture for further details.