Metadata information from kafka - apache-kafka

I am new to Confluent/Kafka and I want to find metadata information from kafka
I want to know
list of producers
list of topics
schema information for topic
Confluent version is 5.0
What are classes (methods) that can give this information?
Are there any Rest API's for the same
Also is zookeeper connection necessary to get this information.

1) I don't think that Kafka brokers are aware of producers that produce messages in topics and therefore there is no command line tool for listing them. However, an answer to this SO question suggests that you can list producers by viewing the MBeans over JMX.
2) In order to list the topics you need to run:
kafka-topics --zookeeper localhost:2181 --list
Otherwise, if you want to list the topics using a Java client, you can call listTopics() method of KafkaConsumer.
You can also fetch the list of topics through ZooKeeper
ZkClient zkClient = new ZkClient("zkHost:zkPort");
List<String> topics = JavaConversions.asJavaList(ZkUtils.getAllTopics(zkClient));
3) To get the schema information for a topic you can use Schema Registry API
In particular, you can fetch all subjects by calling:
GET /subjects HTTP/1.1
Host: schemaregistry.example.com
Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json
which should give a response similar to the one below:
HTTP/1.1 200 OK
Content-Type: application/vnd.schemaregistry.v1+json
["subject1", "subject2"]
You can then get all the versions of a particular subject:
GET /subjects/subject-name/versions HTTP/1.1
Host: schemaregistry.example.com
Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json
And finally, you can get a specific version of the schema registered under this subject
GET /subjects/subject_name/versions/1 HTTP/1.1
Host: schemaregistry.example.com
Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json
Or just the latest registered schema:
GET /subjects/subject-name/versions/latest HTTP/1.1
Host: schemaregistry.example.com
Accept: application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json
In order to perform such actions in Java, you can either prepare your own GET requests (see how to do it here) or use Confluent's Schema Registry Java Client. You can see the implementation and the available methods in their Github repo.
Regarding your question about Zookeeper, note that ZK is a requirement for Kafka.
Kafka uses ZooKeeper so you need to first start a ZooKeeper server if
you don't already have one. You can use the convenience script
packaged with kafka to get a quick-and-dirty single-node ZooKeeper
instance.

Related

How to hide stack trace of Kafka connect api unhandled exception

As part of registering connectors in distributed mode, when some invalid json payload is passed in API request , I am getting error in response with full stack trace, which is not desirable in my case.
Response example:
HTTP/1.1 500 Internal Server Error Connection: close Date: Fri, 26 Jul 2019 08:27:17 GMT Content-Type: application/json Content-Length: 443 Server: Jetty(9.4.11.v20180605)
{"error_code":500,"message":"Cannot construct instance of `java.util.LinkedHashMap` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('')\n at [Source: (org.glassfish.jersey.message.internal.ReaderInterceptorExecutor $UnCloseableInputStream); line: 2, column: 46] (through reference chain: org.apache.k afka.connect.runtime.rest.entities.CreateConnectorRequest[\"config\"])"}
Is there any way to hide or shorten the full stack trace.
Note: I am using Hortonworks Kafka package.
The REST API returns what the REST API returns :)
So you could either (a) stop sending invalid JSON ;) (b) compile your own version of Kafka and disable the bits of output you don't want returned.

Issues while running kafka connect in distributed mode

We are testing kafka connect in distributed mode to pull topic records from kafka to HDFS. We have two boxes. One in which kafka and zookeeper daemons are running. We have kept one instance of kafka connect in this box. We have another box where HDFS namenode is present. We have kept another instance of kafka connect here.
We started kafka,zookeeper and kafka connect in first box. We started kafka connect in second box as well. Now as per confluent documentation, we have to start the HDFS connector(or any other connector for that matter) using REST API. So, after starting kafka connect in these two boxes, we tried starting connector through REST API. We tried below command:-
curl -X POST -H "HTTP/1.1 Host: ip-10-16-34-57.ec2.internal:9092 Content-Type: application/json Accept: application/json" --data '{"name": "hdfs-sink", "config": {"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector", "format.class":"com.qubole.streamx.SourceFormat", "tasks.max":"1", "hdfs.url":"hdfs://ip-10-16-37-124:9000", "topics":"Prd_IN_TripAnalysis,Prd_IN_Alerts,Prd_IN_GeneralEvents", "partitioner.class":"io.confluent.connect.hdfs.partitioner.DailyPartitioner", "locale":"", "timezone":"Asia/Calcutta" }}' http://ip-10-16-34-57.ec2.internal:8083/connectors
As soon as we press enter here, we get below response:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 415 </title>
</head>
<body>
<h2>HTTP ERROR: 415</h2>
<p>Problem accessing /connectors. Reason:
<pre> Unsupported Media Type</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>
The connect-distributed.properties file at etc/kafka/ is below in both the kafka connect nodes. We have created the said three topics as well (connect-offsets,connect-configs,connect-status)
bootstrap.servers=ip-10-16-34-57.ec2.internal:9092
group.id=connect-cluster
key.converter=com.qubole.streamx.ByteArrayConverter
value.converter=com.qubole.streamx.ByteArrayConverter
enable.auto.commit=true
auto.commit.interval.ms=1000
offset.flush.interval.ms=1000
key.converter.schemas.enable=true
value.converter.schemas.enable=true
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.topic=connect-offsets
rest.port=8083
config.storage.topic=connect-configs
status.storage.topic=connect-status
offset.flush.interval.ms=10000
What is the issue here? Are we missing something to start kafka connect in distributed mode to work with HDFS connectors. kafka connect in standalone mode is working fine.
To upload a connector, this is a PUT command, not a POST: http://docs.confluent.io/3.1.1/connect/restapi.html#put--connectors-(string-name)-config
On a side note, I believe that you curl command might be wrong:
you need one -H switch per header, putting all headers in one -H parameter is not how it works (I think).
I do not think that the port is part of the Host header.

Argument type mismatch using nifi template import API

I am trying to use the import endpoint of the Nifi REST API 1.0. I have exported a template as XML using the UI, and am trying to import it using Postman. The request looks like this:
POST /nifi-api/process-groups/63dcaf98-0158-1000-04da-dd54bbb3a5b8/templates/import HTTP/1.1
Host: localhost:8080
Content-Type: application/xml
Cache-Control: no-cache
Postman-Token: 37a10e8b-b30d-b5c8-4219-ca1ba34f79da
<?xml version="1.0" ?>
<template encoding-version="1.0">
<description></description>
...
</template>
I get 400 error return, with message argument type mismatch. There's nothing very useful in the nifi-user.log:
2016-11-14 14:58:22,164 INFO [NiFi Web Server-327] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) POST http://localhost:8080/nifi-api/process-groups/63dcaf98-0158-1000-04da-dd54bbb3a5b8/templates/import (source ip: 127.0.0.1)
2016-11-14 14:58:22,231 INFO [NiFi Web Server-327] o.a.n.w.a.c.IllegalArgumentExceptionMapper java.lang.IllegalArgumentException: argument type mismatch. Returning Bad Request response.
Any ideas what may be causing this, or how I can debug?
Try wrapping the root template element with another element called templateEntity. Most endpoints in Apache NiFi 1.0.0 wrap the object in question with an entity object to relay relevant details about the object when access is denied for to help promote the multi-tenancy model. This pattern was applied to most endpoints to help with consistency throughout the API.
Also you can get to additional details by enabling debug level logging for
<logger name="org.apache.nifi.web.api.config" level="DEBUG" additivity="false">
in conf/logback.xml.
First you have to upload the template, with the following command ( I use curl):
curl -iv -F template=#Sample_Process_group.xml -X POST http://172.17.0.4:8080/nifi-api/process-groups/2a9c6a0d-015c-1000-dec6-e81122344f7e/templates/upload where the guid is your root Process Group.

webHDFS API returns Exception on every query

I setuped single node Hadoop cluster to perform some experiments with HDFS. Via web access all looks good, I created a dedicated folder and copied file from local system to it using command line. It all appeared in web UI. After it I to get access to it via WebHDFS.
For example:
curl -i "http://127.0.0.1:50075/webhdfs/v1/?op=LISTSTATUS"
But after that I get:
HTTP/1.1 400 Bad Request
Content-Type: application/json; charset=utf-8
Content-Length: 154
Connection: close
{
"RemoteException":
{
"exception":"IllegalArgumentException",
"javaClassName":"java.lang.IllegalArgumentException",
"message":"Invalid operation LISTSTATUS"
}
}
The same error I receive on any another command.
I have no idea what went wrong here. Can it be caused for example by missing some components or anything else during setup?
For HDP you can use following URL (with default port):
http://x.x.x.x:50070/webhdfs/v1/?op=LISTSTATUS
For MapR cluster (with default port):
http://x.x.x.x:14000/webhdfs/v1/user?op=LISTSTATUS&user.name=YOUR_USER

Kafka Create Topic API Options for Non-Java Languages

While you can create a topic via Java or Java-based languages (see here), there does not seem to be a clean way to do this without using Java. As a result, pure-language client APIs (like kafka-node, a pure JavaScript client) can't directly create topics. Instead, we have two options:
1) Use a hack like sending a metadata request to a topic -- if auto.create.topics.enable is set to true, then you can create a topic -- but only with the default configuration, no control over partitions, etc.
2) Write a wrapper around a Java-based client just for topic creation. The easiest way to do this is to exec the script bin/kafka-topics.sh with command line arguments, which is ugly, to say the least.
Is there a better way to do this, though? There's a pure-JavaScript client for Zookeeper, node-zookeeper-client, what happens if I manipulate broker / partition info directly in Zookeeper?
Any other thoughts?
You can now use REST Proxy API v3 to create Kafka topics with http requests for non-Java languages.
According to the Confluent REST Proxy API Reference the creation of a topic is possible with the REST Proxy API v3 that is currently available as a preview feature.
"The API v3 can be used for evaluation and non-production testing purposes or to provide feedback to Confluent."
An example of a topic creation request is presented below and documented here:
POST /v3/clusters/cluster-1/topics HTTP/1.1
Host: kafkaproxy.example.com
Content-Type: application/vnd.api+json
Accept: application/vnd.api+json
{
"data": {
"attributes": {
"topic_name": "topic-1",
"partitions_count": 2,
"replication_factor": 3,
"configs": [
{
"name": "cleanup.policy",
"value": "compact"
}
]
}
}
}
Using curl:
curl -X POST -H "Content-Type: application/vnd.api+json" -H "Accept: application/vnd.api+json" \
--data '{"data":{"attributes": {"topic_name": "topic-1", "partitions_count": 2, "replication_factor": 1, "configs": [{"name": "cleanup.policy","value": "compact"}]}}}' \
"http://localhost:8082/v3/clusters/<cluster-id>/topics"
where the cluster-id can be identified using
curl -X GET -H "Accept: application/vnd.api+json" localhost:8082/v3/clusters