Apache kafka with apache Avro - apache-kafka

Can someone tel how exactly we can use the Apache Kafka with Apache Avro.It seems, only the Confluent Platform does support schema registry.
Our requirement is to use schema registry without being used the commercial product like Confluent

Another option is the open source Karapace schema registry, designed to be compatible with Confluent's Schema Registry, but an open source project and it supports Avro. https://github.com/aiven/karapace (disclaimer, I work for Aiven and contribute to this open source project)

You don't need a Schema Registry to use Avro. You can define your own Serializer functions to convert between Avro and bytes, and just use that
Otherwise, Confluent's Schema Registry is "royalty free", not commercialized, the main 5 thing you can't do is to offer it as a public API to compete with Confluent Cloud, but there are other similar projects like Apicurio or Hortonwork's Registry that can be found on Github and Apache 2.0 Licensed

Related

Is the Confluent Platform 7.1 based on Kafka free? open source? for production use

I have usecase to start using Kafka and was looking for opensource free (production) kafka.
When check Confluent 7.1 platform looks suitable as it has zookeeper / kafka / schema registry / kafka UI bundled together.
Before deciding to go ahead with it just want to check if the Confluent Platform 7.1 is free and open source? Am I required to purchase licensing or paid support?
The Confluent Community License covers several components of Confluent Platform, including KSQLDB, the Schema Registry, REST Proxy, and various Kafka Connect plugins. Confluent Control Center (what you call Kafka UI) is only available on a trial basis, outside of which requires Enterprise license payment.
Majority of Confluent Platform individual components are "source-available", and free with limitations. Many of the plugin features like RBAC, Tiered Storage, Cluster Linking, and server-side Kafka record Schema Validation require payment. This is an Enterprise license and also includes Control Center, on-call Support, and several other connectors.
Apache Kafka, it's clients, and Zookeeper are Apache 2.0 licensed.
If you want a completely Apache 2.0 stack, you can replace Confluent Schema Registry with Apicurio and replace Control Center with various Kafka GUI projects that exist on Github, such as AKHQ or CMAK

Confluent platform vs Debezium

I'm trying to use Debezium platform to make a Kafka-cdc.But I was confused.
What is really difference between Confluent platform and Debezium?
Confluent (https://www.confluent.io/) is a platform which mainly integrate Apache-Kafka (https://kafka.apache.org/) and its ecosystems. So let say the basic Confluent platform has Zookeeper, apache kafka, KSql and thier Control Center.
Debezium is another platform to focus Database Streaming.
So you think Confluent is the general Streaming, and Debezium actually has a connector https://debezium.io/documentation/reference/stable/connectors/index.html that can be integreted to Confluent like in https://www.confluent.io/hub/debezium/debezium-connector-postgresql
At the time of writing, Confluent Platform does not have any CDC connectors, and you don't really need it. Apache Kafka Connect that is bundled as part of the Confluent Platform is all that's needed, and can be downloaded directly from Apache Kafka site instead.
Debezium is built on Kafka Connect API, and provided as a plug in.

Upgrading apache kafka to confluent platform

We are planning to upgrade our existing apache kafka cluster to confluent kafka while upgrading do we have any data loss in the topics?? And also main reason we are upgrading is to use s3 sink connector is there any connector which is available in apache kafka itself?
Unless you want to migrate to Confluent Server, there is nothing you need to migrate; Confluent Platform includes Apache Kafka
Kafka Connect, on the other hand, is a pluggable environment, and doesn't require Confluent tools/systems other than the specific JAR file(s) for the S3 Connector
You can use S3 sink connector from apache camel
https://camel.apache.org/camel-kafka-connector/next/reference/connectors/camel-aws-s3-sink-kafka-sink-connector.html
Just need to download the S3 sink connector jar file from this link:
https://camel.apache.org/camel-kafka-connector/next/reference/index.html
Copy the jar file in connector plugins path. It depends on value of your properties. By default on the relative path: plugins/connectors, or set in plugin.path property.
So you don't need to restart and lost any data.

Kafka Sink to Data Lake Storage without Confluent

I am trying to find options for Open Source Kafka writing directly to Azure Data Lake storage Gen2 . It seems I have few options and mostly circling around Confluent like below :
Use Confluent Cloud with Apache Kafka - Need to Subscribe with Confluent and pay charges (Confluent Cloud with ADLS
Use Azure VM with Confluent Hub and Install Confluent Platform
At present I am not wiling to pay Confluent licensing and not want to test with confluent package (more and more wrappers and hoops around)
Any option to use Open source Kafka directly to write data to ADLS Gen2 ? If yes how can we achieve this any useful information to share ?
Firstly, Kafka Connect is Apache2 licensed product and an open-platform consisting of plugins; Confluent Platform/Cloud is not a requirement to use it. You can download the Azure connector as a ZIP file and install it like any other
However, it is at Confluent's (or any developer) discretion to provide a paid license agreement for their software and any support, and there might otherwise be a limited trial period where you can use the plugin for some time.
That being said, you do not "need" Confluent Platform, and there are no "hoops" to using it if you did because it only adds extras to Apache Kafka+Zookeeper, it is not its own thing (you can use your existing Kafka installation with the other Confluent products)
Regarding other open-source things. StackOverflow is not the place for software recommendations or seeking tools/libraries. You can use Spark/Flink/Nifi, though, I'm sure to reimplement a similar pipeline as Kafka Connect, or you can write your own Kafka Connector based on the open-source kafka-connect-storage-cloud project that is used as a base for S3, GCS, and Azure, AFAIK.
There is Apache Camel Connectors, which has an Azure Datalake connector for sending and receiving data. (sink and source) Check this out: https://camel.apache.org/camel-kafka-connector/latest/connectors/camel-azure-storage-datalake-kafka-sink-connector.html
This is a free solution that doesnt require Confluent licenses or technologies to be used.

What are the main differences between HDF schema registry and the Confluent one?

I was wondering about the differences of the kafka embebed in the HDF suite and the Confluent one, specifically the schema registry tool.
https://registry-project.readthedocs.io/en/latest/competition.html
The Hortonworks schema registry depends on a Mysql or Postgres database (supposedly this is pluggable, so you could write your own storage layer) to store its schemas while the Confluent one stores schemas directly in Kafka. Therefore there's more infrastructure to manage with Hortonworks implementation.
The Hortonworks one supposedly has some plugin mechanism so that it'll support the Confluent serialization format, but I've not seen it used in practice. It also has pluggable schema storage, but I've not seen anything except Avro used in it.
The Hortonworks one has its own web UI and rich editor, compared to the Confluent one, where you're limited to third-party tools or purchasing a license for Confluent Control Center.
Hortonworks aims to provide integrations with Spark, Nifi, SMM, Storm, Atlas, possibly Ranger, and other components of their HDF stack. Confluent Schema Registry support in those tools is all community driven.