AWS MSK vs Confluent for hosting Kafka? [closed] - apache-kafka

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
In terms of investing for the most value, how does AWS MSK compare to Confluent when it comes to hosting an end to end Kafka event sourcing?
The main criteria to be used for comparing are:
Monitoring
Ease of deployment and configuring
Security

I have used open-source, on-prem Cloudera, and MSK. When comparing them together they have all had their quirks.
If you just base on the speed of provisioning a secure Kafka cluster. I think MSK would win hands down. Someone with Kafka, AWS Certificate Manager, and Terraform can get it all done very quickly. Though there are a few issues around Terraform TLS and AWS CLI but there are workarounds.
If you are planning to use Kafka Connect then confluent makes lots of sense.
If you have Kafka developers who have experience in writing Kafka Connect sinks and source. Well, then you may not need a subscription-based model from confluent. Though you may not save a lot of money. Either spend in development or spend in subscription costs.
If you like serverless - MSK is quite good. However, there is no SSH access to the Kafka cluster. You cannot tune the JVM.
Monitoring is built out of the box for MSK via open monitoring via JMX metrics and prometheus. You also have CloudWatch as well. But open monitoring pretty much gives all the metrics you need. In open-source, you can easily deploy monitoring. Rather MSK is doing the same.
MSK provides security using either TLS or IAM. Though there are some issues around enabling IAM-based security for MSK using Terraform. 2 Way TLS Client authentication is quite easy to set up.
MSK also provides auto-scaling but again if you are planning to use Terraform there may be some interoperability issues.
I am sure folks here can add a lot more on confluent.

Related

Need to setup a Customized Kubernetes Logging strategy [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
So far in our legacy deployments of webservices to VM clusters, we have effectively been using Log4j2 based multi-file logging on to a persistent Volume where the log files are rolled over each day. We have a need to maintain logs for about 3 months, before they can be purged.
We are migrating to a Kubernetes Infrastructure and have been struggling on what would be the best logging strategy to adapt with Kubernetes Clusters. We don't quite like the strategies involving spitting out all logging to STDOUT/ERROUT and using come centralized tools like Datadog to manage the logs.
Our Design requirements for the Kubernetes Logging Solution are:
Using Lo4j2 to multiple files appenders.
We want to maintain the multi-file log appender structure.
We want to preserve the rolling logs in archives for about 3-months
Need a way to have easy access to the logs for searching, filtering etc.
The Kubectrl setup for viewing logs may be a bit too cumbersome for our needs.
Ideally we would like to use the Datadog dashboard approach BUT using multi-file appenders.
The serious limitation of Datadog we run into is the need for having everything pumped to STDOUT.
Start using containers platforms or building containers means that as a first step we must to change our mindset. Create logs files in your containers is not the best practices for two reasons:
Your containers should be stateless, so the should not save anything inside of it, because when it is deleted and created again your files will be desapeared.
When you send your outputs using Passive Logging(STDOUT/STDERR), Kubernetes creates the logs files for you, this files can be used by platforms like fluentd or logstash to collects those logs and send it to a log aggregation tool.
I recommend to use the Passive Logging which is the recommended way by Kubernetes and the standard for cloud native applications, maybe in the future you will need to use your app in a cloud services, which also use Passive Logging to check application errors
In the following links you will see some refereces about why k8s recommends to use Passive Logging:
k8s checklist best practices
Twelve Factor Applications Logging

What AWS services do I need for Angular, Spring Boot and PostgreSQL projects? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
In researching the many AWS offerings and plans, I'm overwhelmed by the terminology and pricing around Docker, RDS, EC2, Beanstalk, and trying to wrap my head around it all. In the end, all we'd like is the cheapest way to host internal Angular 7+ apps that have a corresponding Spring Boot REST API which pulls from a PostgresSQL database. Of course each app/REST/DB stack should have a dev, test, and prod environment as well. Utilizing AWS, what is a good and cost-effective way to achieve these requirements?
Angular - Use S3 and CloudFront (Static content)
Spring Boot Rest API's - Use EC2, Beanstalk or Lambda (for serverless)
PostgreSQL - Use RDS or install it on EC2 instance.
For Angular and Spring Boot Rest APIs, you can host both of them inside the EC2 machine.
For Database, You can host the postgresql servers on EC2 machines for dev and test environments and for production you can choose RDS.

Using Snakemake with Azure Kubernetes Services [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Does anyone have a working example of using Snakemake with Azure Kubernetes Service (AKS)? If it is supported, which flags and setup are needed to use the Snakemake Kubernetes executor with AKS? What material there is out there is mostly on AWS with S3 buckets for storage.
I have never tried it, but you can basically take this as a blueprint, and replace the google storage part with a storage backend that is working in Azure. As far as I know, Azure has its own storage API, but there are workarounds to expose an S3 interface (google for Azure S3). So, the strategy would be to setup an S3 API, and then use the S3 remote provider for Snakemake. In the future, Snakemake will also support Azure directly as a remote provider.
you are aware of this already, but for the benefit of others:
Support of AKS has been build into Snakemake now. This works even without a shared filesystem. Most parts of the original blog post describing the implementation have made it into the official Snakemake executor documentation.
In a nutshell: upload your data to blob and deploy an AKS cluster. The run Snakemake with these flags: --default-remote-prefix --default-remote-provider AzBlob and --envvars AZ_BLOB_ACCOUNT_URL AZ_BLOB_CREDENTIAL, where AZ_BLOB_CREDENTIAL is optional if you use a SAS in the account URL. You can use your Snakefile as is.

difference between dcos-kafka-service and mesos-kafka

I’m doing a POC to deploy Kafka as an application on Mesos Cluster. I came across these 2 codebases on github. One developed by apache-mesos (github page) & other developed by mesosphere and can run only on DCOS (github page).
Question: Would like to know if there are any differences between DCOS-Kafka & mesos-Kafka in terms of features and extended functionality.
Regarding Mesos-Kafka:
I don’t see active participation on github (and some open issues) for mesos-kafka in the past months. Can I assume that the service is robust enough that I can use in production environment? Any Inputs on this would be helpful.
kakfa-mesos is a package that includes a release of Kafka and a custom mesos scheduler that was meant to work around issues with running Kafka as a stateful service on Marathon. I think post but confluent is useful. It also includes a RESTful api for doing ops tasks and aims to include these features in the future (this is pulled from the article I linked)
Integrating Kafka commands (e.g. kafka-topics, etc) into the Scheduler so it can be used through the CLI and REST API.
Auto-scaling clusters (including auto reassignment of partitions) so that the resources (CPU, RAM, etc.) that brokers are using can be used elsewhere in known valleys of traffic.
Rack-aware partition assignment for fault tolerance.
Hooks so that producers and consumers can also be launched from the Scheduler and managed with the cluster.
Automated partition reassignment based on load and traffic
I haven't used it in a production environment myself but it has the support of Confluent which is a good sign.
DC/OS Kafka on the other hand is a DC/OS service which will probably only be useful if you are already running or plan on running services through Mesosphere's DC/OS. It also includes an API and a CLI management tool but is less ambitious with additional features. It's current feature set includes
Single-command installation for rapid provisioning
Multiple clusters for multiple tenancy with DC/OS
High availability runtime configuration and software updates
Storage volumes for enhanced data durability, known as Mesos Dynamic * * Reservations and Persistent Volumes
Integration with syslog-compatible logging services for diagnostics and troubleshooting
Integration with statsd-compatible metrics services for capacity and performance monitoring

How to use kafka and storm on cloudfoundry?

I want to know if it is possible to run kafka as a cloud-native application, and can I create a kafka cluster as a service on Pivotal Web Services. I don't want only client integration, I want to run the kafka cluster/service itself?
Thanks,
Anil
I can point you at a few starting points, there would be some work involved to go from those starting points to something fully functional.
One option is to deploy the kafka cluster on Cloud Foundry (e.g. Pivotal Web Services) using docker images. Spotify has Dockerized kafka and kafka-proxy (including Zookeeper). One thing to keep in mind is that PWS currently doesn't support apps with persistence (although this work is starting) so if you were to go this route right now, you would lose the data in kafka when the application is rolled. Looking at that Spotify repo, it looks like the docker images are generally run without any mounted volumes, so this persistence-less kafka seems like it may be a valid use case (I don't know enough about kafka to say).
The other option is to deploy kafka directly on some IaaS (e.g. AWS) using BOSH. BOSH can be hard if you're seeing it for the first time, but it is the ideal way to deploy any distributed software that you want running on VMs. You will also be able to have persistent volumes attached to your kafka VMs if necessary. Here is a kafka BOSH release which may work.
Once you have your cluster running, you have two ways to integrate your Cloud Foundry applications with it. The simplest is just to provide it to your applications as a "user-provided service", which lets you flow kafka cluster access info to your apps. The alternative would to put a service broker in front of your cluster, which would be especially useful if you have many different people who will be pushing apps that need to talk to the kafka cluster. Rather than you having to manually tell people the access info each time, they can do something simple like cf bind-service SOME_APP YOUR_KAFKA_SERVICE. Here is a kafka service broker along with more info about service brokers in general.
According to the 12-factor app description (https://12factor.net/processes), Kafka should not run as an application on top of Cloud Foundry:
Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.
Kafka is often considered a "distributed commit log" and as such carries a large amount of state. Many companies use it to keep all events flowing through their distributed system of micro services for a long (sometimes unlimited) amount of time.
Therefore I would strongly recommend to go for the second option in the accepted answer: Kafka topics should be bound to your applications in the form of stateful services.