Is the Confluent Platform based on Kafka free? open source? - apache-kafka

Kafka itself is completely free and open source.
Confluent is the for profit company by the creators of Kafka. The Confluent Platform is Kafka plus various extras such as the schema registry and database connectors. I presume Confluent makes money by selling support contracts and services.
Is the Confluent Platform free and/or open source? Am I obligated to purchase licensing or paid support?

The Confluent Platform is free and open source (see https://github.com/confluentinc/ for the source).
There is also a Confluent Platform Enterprise version that includes the Control Center monitoring app (which is not open source) and includes support for the core Kafka, Confluent Platform OS components, and Control Center.
See http://www.confluent.io/product for more details.
UPDATE for Version 5.1 and above
On Dec 14th, 2018 Confluent announced that that new versions of the Confluent Platform (starting with Version 5.1) will be distributed under a new Confluent Community License.
The detailed FAQ about the new license located at
https://www.confluent.io/confluent-community-license-faq includes the following question and answer:
“Is Confluent Community License open source?
Strictly speaking it is “source-available.” Many people use the phrase “open source” in a loose sense to mean that you can freely download, modify, and redistribute the code, and those things are all true of the code under the Confluent Community License. However, in the strictest sense “open source” means a license approved by the Open Source Initiative (“OSI”) which meets a particular set of criteria. The Confluent Community License is not approved by the OSI and likely would not be as it excludes the use case of creating a SaaS offering of the code. Because of this, we will not refer to the Confluent Community License or any code released under it as open source.”
Announcement of License changes:
https://www.confluent.io/blog/license-changes-confluent-platform
The actual Confluent Community License
https://www.confluent.io/confluent-community-license

Taken from their website's product page.
The Confluent Platform is an open source platform that contains all the components you need to create a scalable data platform built around Apache Kafka. These components draw on our experience building some of the largest streaming data pipelines in the world.
Here's the download link for their product. It seems like they only make money off supporting their product if you choose to pay for it. They now offer a managed cloud service as well as support for their open source products.

Related

How to setup Rest proxy on kafka clusture (excluding confluent or any other third party licensed )?

Every search returns the Confluent pages for Kafka REST proxy.
Is there any other way to setup with the vanilla application or any open source ones?
I think the community licensed ones are free to you use. Please check the following page https://docs.confluent.io/platform/current/installation/docker/image-reference.html
Also you can consider this : https://strimzi.io/docs/bridge/latest/#api_reference-bridge if you need only a bridge for integration.

Is the Confluent Platform 7.1 based on Kafka free? open source? for production use

I have usecase to start using Kafka and was looking for opensource free (production) kafka.
When check Confluent 7.1 platform looks suitable as it has zookeeper / kafka / schema registry / kafka UI bundled together.
Before deciding to go ahead with it just want to check if the Confluent Platform 7.1 is free and open source? Am I required to purchase licensing or paid support?
The Confluent Community License covers several components of Confluent Platform, including KSQLDB, the Schema Registry, REST Proxy, and various Kafka Connect plugins. Confluent Control Center (what you call Kafka UI) is only available on a trial basis, outside of which requires Enterprise license payment.
Majority of Confluent Platform individual components are "source-available", and free with limitations. Many of the plugin features like RBAC, Tiered Storage, Cluster Linking, and server-side Kafka record Schema Validation require payment. This is an Enterprise license and also includes Control Center, on-call Support, and several other connectors.
Apache Kafka, it's clients, and Zookeeper are Apache 2.0 licensed.
If you want a completely Apache 2.0 stack, you can replace Confluent Schema Registry with Apicurio and replace Control Center with various Kafka GUI projects that exist on Github, such as AKHQ or CMAK

Kafka Sink to Data Lake Storage without Confluent

I am trying to find options for Open Source Kafka writing directly to Azure Data Lake storage Gen2 . It seems I have few options and mostly circling around Confluent like below :
Use Confluent Cloud with Apache Kafka - Need to Subscribe with Confluent and pay charges (Confluent Cloud with ADLS
Use Azure VM with Confluent Hub and Install Confluent Platform
At present I am not wiling to pay Confluent licensing and not want to test with confluent package (more and more wrappers and hoops around)
Any option to use Open source Kafka directly to write data to ADLS Gen2 ? If yes how can we achieve this any useful information to share ?
Firstly, Kafka Connect is Apache2 licensed product and an open-platform consisting of plugins; Confluent Platform/Cloud is not a requirement to use it. You can download the Azure connector as a ZIP file and install it like any other
However, it is at Confluent's (or any developer) discretion to provide a paid license agreement for their software and any support, and there might otherwise be a limited trial period where you can use the plugin for some time.
That being said, you do not "need" Confluent Platform, and there are no "hoops" to using it if you did because it only adds extras to Apache Kafka+Zookeeper, it is not its own thing (you can use your existing Kafka installation with the other Confluent products)
Regarding other open-source things. StackOverflow is not the place for software recommendations or seeking tools/libraries. You can use Spark/Flink/Nifi, though, I'm sure to reimplement a similar pipeline as Kafka Connect, or you can write your own Kafka Connector based on the open-source kafka-connect-storage-cloud project that is used as a base for S3, GCS, and Azure, AFAIK.
There is Apache Camel Connectors, which has an Azure Datalake connector for sending and receiving data. (sink and source) Check this out: https://camel.apache.org/camel-kafka-connector/latest/connectors/camel-azure-storage-datalake-kafka-sink-connector.html
This is a free solution that doesnt require Confluent licenses or technologies to be used.

Why would someone pick apache Kafka instead of Confluent?

I'm about to start deploying to production a couple of Kafka cluster in 2 different DCs. My main use is for replication using MirrorMaker. To continuously stream/replicate ElasticSearch and Postgres between DCs in order to have a (near) real-time backup and failover.
What I can't get my head around is this simplest question: should I use Confluent or apache Kafka?
I can see that Confluent adds many niceties but what I don't get it: why would someone pick plain Apache Kafka then? I've seen this answer and it seems clear: "pick Confluent, has way more stuff".
As answered in linked post, you can add whatever external processes you want to Apache Kafka.
Note: You are not picking either or, you are always picking Apache Kafka. Confluent Platform adds on top of, similar to Cloudera's Data Platform, as an alternative consideration.
If you want to connect Elasticsearch and Postgres (via JDBC), both of those connectors are Open-Source under the Confluent Community License, so that would be one potential reason for not using Confluent products.
Other reason: Do you need the "more stuff"? Are you able to get support from elsewhere? For example, AWS support on MSK or IBM Streams or Azure EventHub are not using Confluent Platform (because it's against the above license)
MirrorMaker and MirrorMaker2 are both under the Apache License, so they have no such usage / redistribution restrictions.
should I use Confluent or apache Kafka?
When deciding on deploying a vanilla Apache or a commercially supported product you should think about the O&M (operation and maintenance) timeline and what you gain and lose. Whatever you choose will be very difficult to replace once in production.
I'll also agree with #OneCricketeer's answer.
Do you need the "more stuff"?
I work as a professional services consultant with some Apache products. My advice is keep your stack (whatever it is) as simple as possible. So if you don't need the additional tools and functionality of Confluent, don't use them. It's how they make the product "sticky" (re: vendor lock-in).
Vanilla Apache Kafka
Pro No vendor lock-in or dependencies
Pro Faster updates and feature development
Con No nice dashboards
Con Harder to secure
Confluent
Pro Commercial support and professional services available
Pro More stable with fast and easy security patches
Pro Nice dashboard and management tools
Pro Easier to properly secure
Con Expensive
Con Expect vendor lock-in and frequent up-sells
My Opinion
If you have money to spare and this will be a critical piece of infrastructure I'd recommend buying through Confluent. If you try to avoid paying for them, you'll have to hire someone (expensive) who knows it anyway and you'll have to deal with the patching nightmare of open source projects.
If this is something you want to kick the tires on, can allow for downtime, or think you'll replace in 2 years, I'd just use the Apache Kafka with one of the open source dashboards.

what in general does the SCC API do?

I can't seem to find general documentation on the Microsoft SCC API. I don't want to wade through detailed documentation on the specific interfaces/methods/etc, I just would like to know what in general it allows and what concepts it uses. (edit: without having to download the whole SDK or applying for a license requiring an NDA.)
edit: what's the abstraction layer that it sees in common between different systems? e.g. there's files and changesets? or just files? and each file has a name?
As I understood it, you had at one time to be a Microsoft Partner to get at the SCC API SDK which would've included the documentation however, I later found that they had relaxed that requirement. AFAIK this API describes the interface between Visual Studio and an SCC provider. So it would allow you to write a provider to allow Visual Studio to interact with a version control system. Microsoft examples would be the SourceSafe provider and probably the Team System provider. A non-Microsoft example would be the Visual SVN plugin for Subversion.