Is the Confluent Platform 7.1 based on Kafka free? open source? for production use - apache-kafka

I have usecase to start using Kafka and was looking for opensource free (production) kafka.
When check Confluent 7.1 platform looks suitable as it has zookeeper / kafka / schema registry / kafka UI bundled together.
Before deciding to go ahead with it just want to check if the Confluent Platform 7.1 is free and open source? Am I required to purchase licensing or paid support?

The Confluent Community License covers several components of Confluent Platform, including KSQLDB, the Schema Registry, REST Proxy, and various Kafka Connect plugins. Confluent Control Center (what you call Kafka UI) is only available on a trial basis, outside of which requires Enterprise license payment.
Majority of Confluent Platform individual components are "source-available", and free with limitations. Many of the plugin features like RBAC, Tiered Storage, Cluster Linking, and server-side Kafka record Schema Validation require payment. This is an Enterprise license and also includes Control Center, on-call Support, and several other connectors.
Apache Kafka, it's clients, and Zookeeper are Apache 2.0 licensed.
If you want a completely Apache 2.0 stack, you can replace Confluent Schema Registry with Apicurio and replace Control Center with various Kafka GUI projects that exist on Github, such as AKHQ or CMAK

Related

Can I configure number of brokers in Confluent Kafka Cloud with basic endition?

Because of budget issue, I personally am using Confluent Cloud basic edition which is free.
Is there any way to setup number of kafka brokers on my own?
I could not find anything related on confluent cloud web UI settings.
Does only Dedicated edition support such settings? I cannot afford that much right now.
Or is it possible to configure cluster settings (like number of brokers etc)
on my local terminal CLI?
Confluent Cloud provides a "serverless" experience, and you cannot configure the number of brokers.

Kafka Sink to Data Lake Storage without Confluent

I am trying to find options for Open Source Kafka writing directly to Azure Data Lake storage Gen2 . It seems I have few options and mostly circling around Confluent like below :
Use Confluent Cloud with Apache Kafka - Need to Subscribe with Confluent and pay charges (Confluent Cloud with ADLS
Use Azure VM with Confluent Hub and Install Confluent Platform
At present I am not wiling to pay Confluent licensing and not want to test with confluent package (more and more wrappers and hoops around)
Any option to use Open source Kafka directly to write data to ADLS Gen2 ? If yes how can we achieve this any useful information to share ?
Firstly, Kafka Connect is Apache2 licensed product and an open-platform consisting of plugins; Confluent Platform/Cloud is not a requirement to use it. You can download the Azure connector as a ZIP file and install it like any other
However, it is at Confluent's (or any developer) discretion to provide a paid license agreement for their software and any support, and there might otherwise be a limited trial period where you can use the plugin for some time.
That being said, you do not "need" Confluent Platform, and there are no "hoops" to using it if you did because it only adds extras to Apache Kafka+Zookeeper, it is not its own thing (you can use your existing Kafka installation with the other Confluent products)
Regarding other open-source things. StackOverflow is not the place for software recommendations or seeking tools/libraries. You can use Spark/Flink/Nifi, though, I'm sure to reimplement a similar pipeline as Kafka Connect, or you can write your own Kafka Connector based on the open-source kafka-connect-storage-cloud project that is used as a base for S3, GCS, and Azure, AFAIK.
There is Apache Camel Connectors, which has an Azure Datalake connector for sending and receiving data. (sink and source) Check this out: https://camel.apache.org/camel-kafka-connector/latest/connectors/camel-azure-storage-datalake-kafka-sink-connector.html
This is a free solution that doesnt require Confluent licenses or technologies to be used.

Why would someone pick apache Kafka instead of Confluent?

I'm about to start deploying to production a couple of Kafka cluster in 2 different DCs. My main use is for replication using MirrorMaker. To continuously stream/replicate ElasticSearch and Postgres between DCs in order to have a (near) real-time backup and failover.
What I can't get my head around is this simplest question: should I use Confluent or apache Kafka?
I can see that Confluent adds many niceties but what I don't get it: why would someone pick plain Apache Kafka then? I've seen this answer and it seems clear: "pick Confluent, has way more stuff".
As answered in linked post, you can add whatever external processes you want to Apache Kafka.
Note: You are not picking either or, you are always picking Apache Kafka. Confluent Platform adds on top of, similar to Cloudera's Data Platform, as an alternative consideration.
If you want to connect Elasticsearch and Postgres (via JDBC), both of those connectors are Open-Source under the Confluent Community License, so that would be one potential reason for not using Confluent products.
Other reason: Do you need the "more stuff"? Are you able to get support from elsewhere? For example, AWS support on MSK or IBM Streams or Azure EventHub are not using Confluent Platform (because it's against the above license)
MirrorMaker and MirrorMaker2 are both under the Apache License, so they have no such usage / redistribution restrictions.
should I use Confluent or apache Kafka?
When deciding on deploying a vanilla Apache or a commercially supported product you should think about the O&M (operation and maintenance) timeline and what you gain and lose. Whatever you choose will be very difficult to replace once in production.
I'll also agree with #OneCricketeer's answer.
Do you need the "more stuff"?
I work as a professional services consultant with some Apache products. My advice is keep your stack (whatever it is) as simple as possible. So if you don't need the additional tools and functionality of Confluent, don't use them. It's how they make the product "sticky" (re: vendor lock-in).
Vanilla Apache Kafka
Pro No vendor lock-in or dependencies
Pro Faster updates and feature development
Con No nice dashboards
Con Harder to secure
Confluent
Pro Commercial support and professional services available
Pro More stable with fast and easy security patches
Pro Nice dashboard and management tools
Pro Easier to properly secure
Con Expensive
Con Expect vendor lock-in and frequent up-sells
My Opinion
If you have money to spare and this will be a critical piece of infrastructure I'd recommend buying through Confluent. If you try to avoid paying for them, you'll have to hire someone (expensive) who knows it anyway and you'll have to deal with the patching nightmare of open source projects.
If this is something you want to kick the tires on, can allow for downtime, or think you'll replace in 2 years, I'd just use the Apache Kafka with one of the open source dashboards.

What are the main differences between HDF schema registry and the Confluent one?

I was wondering about the differences of the kafka embebed in the HDF suite and the Confluent one, specifically the schema registry tool.
https://registry-project.readthedocs.io/en/latest/competition.html
The Hortonworks schema registry depends on a Mysql or Postgres database (supposedly this is pluggable, so you could write your own storage layer) to store its schemas while the Confluent one stores schemas directly in Kafka. Therefore there's more infrastructure to manage with Hortonworks implementation.
The Hortonworks one supposedly has some plugin mechanism so that it'll support the Confluent serialization format, but I've not seen it used in practice. It also has pluggable schema storage, but I've not seen anything except Avro used in it.
The Hortonworks one has its own web UI and rich editor, compared to the Confluent one, where you're limited to third-party tools or purchasing a license for Confluent Control Center.
Hortonworks aims to provide integrations with Spark, Nifi, SMM, Storm, Atlas, possibly Ranger, and other components of their HDF stack. Confluent Schema Registry support in those tools is all community driven.

Is the Confluent Platform based on Kafka free? open source?

Kafka itself is completely free and open source.
Confluent is the for profit company by the creators of Kafka. The Confluent Platform is Kafka plus various extras such as the schema registry and database connectors. I presume Confluent makes money by selling support contracts and services.
Is the Confluent Platform free and/or open source? Am I obligated to purchase licensing or paid support?
The Confluent Platform is free and open source (see https://github.com/confluentinc/ for the source).
There is also a Confluent Platform Enterprise version that includes the Control Center monitoring app (which is not open source) and includes support for the core Kafka, Confluent Platform OS components, and Control Center.
See http://www.confluent.io/product for more details.
UPDATE for Version 5.1 and above
On Dec 14th, 2018 Confluent announced that that new versions of the Confluent Platform (starting with Version 5.1) will be distributed under a new Confluent Community License.
The detailed FAQ about the new license located at
https://www.confluent.io/confluent-community-license-faq includes the following question and answer:
“Is Confluent Community License open source?
Strictly speaking it is “source-available.” Many people use the phrase “open source” in a loose sense to mean that you can freely download, modify, and redistribute the code, and those things are all true of the code under the Confluent Community License. However, in the strictest sense “open source” means a license approved by the Open Source Initiative (“OSI”) which meets a particular set of criteria. The Confluent Community License is not approved by the OSI and likely would not be as it excludes the use case of creating a SaaS offering of the code. Because of this, we will not refer to the Confluent Community License or any code released under it as open source.”
Announcement of License changes:
https://www.confluent.io/blog/license-changes-confluent-platform
The actual Confluent Community License
https://www.confluent.io/confluent-community-license
Taken from their website's product page.
The Confluent Platform is an open source platform that contains all the components you need to create a scalable data platform built around Apache Kafka. These components draw on our experience building some of the largest streaming data pipelines in the world.
Here's the download link for their product. It seems like they only make money off supporting their product if you choose to pay for it. They now offer a managed cloud service as well as support for their open source products.