I am trying to load a 1GB csv file in kafka topic using Spooldir-connector in kafka-connect.
The issue is that the connector tasks fails after I run the command.
If I use "errors.tolerance":"all". The connector processes the data but it is not showing in the topics. I used kafkacat as consumer to see the data in topic.
I used following command.
curl -i -X PUT -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/source-csv-spooldir-1/config
-d '{"connector.class":"com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector",
"topic": "data",
"input.path": "/data/unprocessed",
"finished.path": "/data/processed",
"error.path":"/data/error",
"input.file.pattern":".*\\.csv",
"schema.generation.enabled":"true",
"csv.first.row.as.header":"true",
"errors.tolerance":"all"
}'
also can anyone tell me how to mention key.schema and value.schema in this command?
Related
My Kafka cluster runs on kubernetes and I am using a custom image to run Kafka Connect with required mongoDB official source and sink connectors.
My mongoDB instance also runs on kubernetes. My issue is, I am unable to connect my live DB with Kafka Connect.
My connector config currently looks like this,
curl -X PUT \
-H "Content-Type: application/json" \
--data '{
"connector.class":"com.mongodb.kafka.connect.MongoSourceConnector",
"tasks.max": "1",
"connection.uri": "mongodb://192.168.190.132:27017,192.168.190.137:27017",
"database": "tractor",
"collection": "job",
"topic.prefix": "testing-mongo"
}' \
http://10.108.202.171:8083/connectors/mongo_source_job/config
Thanks for your reply. The issue was the stemming from TLS. I modified my config as follows,
"connection.uri": "mongodb://192.168.190.132:27017,192.168.190.137:27017/?tlsInsecure=true"
Its working now!
Can you try connecting to MongoDB service using the service name?
kubectl get service -n <namespace>
Use the above to get the services in the namespace of MongoDB and use the service name instead of the Ip's you have and see if that works?
I'm trying out Rest Proxy in Kafka.
When I type the following url in my browser http://192.168.0.30:8082/topics,
I get the expected results :
["__confluent.support.metrics","_confluent-command","_confluent-controlcenter-5-
2-2-1-MetricsAggregateStore-changelog","_confluent-controlcenter-5-2-2-1-actual-
group-consumption-rekey","_confluent-controlcenter-5-2-2-1-expected-group-
consumption-rekey","_confluent-controlcenter-5-2-2-1-metrics-trigger-measurement-
rekey","_confluent-ksql-default__command_topic","_confluent-metrics","_confluent-
monitoring","_schemas","connect-configs","connect-offsets","connect-
statuses","default_ksql_processing_log","test","test1"]
My question : I try not to use CURL. I have the following CURL command examples . If I want to use only my browser like above, how can I change it?
I tried this, but... (How can I consume my topic test?)
**Just an example from a document : **
# Create a consumer for binary data, starting at the beginning of the topic's
# log. Then consume some data from a topic.
$ curl -X POST -H "Content-Type: application/vnd.kafka.v1+json" \
--data '{"id": "my_instance", "format": "binary", "auto.offset.reset": "smallest"}' \
http://localhost:8082/consumers/my_binary_consumer
{"instance_id":"my_instance","base_uri":"http://localhost:8082/consumers/my_binar
y_consumer/instances/my_instance"}
$ curl -X GET -H "Accept: application/vnd.kafka.binary.v1+json" \
http://localhost:8082/consumers/my_binary_consumer/instances/my_instance/topics/test
[{"key":null,"value":"S2Fma2E=","partition":0,"offset":0}]
Browsers can only issue GET requests.
You could use tools like Postman or Insomnia to issue other HTTP requests.
(For further reference)
I used Postman in order to use REST Proxy for Kafka.
1. I subscribed to a topic test.
$ curl -X POST -H "Content-Type: application/vnd.kafka.v2+json" --data
'{"topics":["test"]}' \
http://192.168.0.30:8082/consumers/my_json_consumer/instances/my_consumer_instanc
e/subscription
( I changed this CURL to fit into Postman. )
2. Then, I consumed the topic.
$ curl -X GET -H "Accept: application/vnd.kafka.json.v2+json" \
http://192.168.0.30:8082/consumers/my_json_consumer/instances/my_consumer_instanc
e/records
(I changed this CURL to fit into Postman.)
I would love to setup a cluster of JDBC Kafka Connectors and configure them to pull from multiple databases running on the same host.
I've been looking through the Kafka Connect documentation, and it appears that after you configure the JDBC connector that it can only pull data from a single database.
Can anyone confirm this?
Depending on the mode you start your workers (standalone or distributed) :
In the standalone mode, you can start multiple jdbc connectors by using :
bin/connect-standalone worker.properties connector1.properties [connector2.properties connector3.properties ...]
Where each connector.properties match one database.
More details here : Running workers
In the distributed mode, first start workers with :
bin/connect-distributed worker.properties
Then push each configuration with POST http requests like :
$ curl -X POST -H "Content-Type: application/json" --data '{"name": "jdbc-source", "config": {"connector.class":"JdbcSourceConnector", "tasks.max":"1", "connection.url":"jdbc:sqlite:test.db", "topic.prefix":"connect-jdbc-test-", "mode":"bulk" }}' http://worker_host:8083/connectors
Or, to use a file containing the JSON-formatted configuration
$ curl -X POST -H "Content-Type: application/json" --data #config.json http://worker_host:8083/connectors
Frequently when developing with MessageHub, I find that I want to purge my development data from a topic.
How can I purge a MessageHub topic?
This question is similar to Purge Kafka Queue but differs because that question is directed at apache kafka and I'm not sure if Message Hub supports the kafka command line tools.
The only way to purge a Kafka topic from within Message Hub is to delete and recreate the topic. You can do this manually using the Web UI provided by the Message Hub service. Alternatively you can use the REST API for administering Kafka topics. The advantage of using the REST API is that it can be scripted.
The Message Hub REST API is documented in Swagger here: https://github.com/ibm-messaging/message-hub-docs/blob/master/kafka-administration-api/KafkaTopicManagement.yaml. If you are not a Swagger Guru then the REST call to delete is:
POST /admin/topics/<TOPICNAME>
You will need to specify your Message Hub API key (from VCAP_SERVICES) using the X-Auth-Token header to authenticate the request. So a sample curl implementation would look like:
curl -k -v -X DELETE -H 'Content-Type: application/json' -H 'Accept: */*' \
-H 'X-Auth-Token: yourapikeyhere' \
https://admin-endpoint-goes-here/admin/topics/<TOPICNAME>
The one gotcha is that Kafka topic deletion is asynchronous. So before you can re-create the topic, you need to make sure that the deletion process for the original topic has completed. This can be achieved by polling the following endpoint until it returns a 404 (Not Found) status code:
GET /topics/<TOPICNAME>
(Again the X-Auth-Token header must be present).
In curl:
curl -k -v -H -H 'Accept: application/json' \
-H 'X-Auth-Token: yourapikeyhere' \
https://admin-endpoint-goes-here/topics/<TOPICNAME>
To (re-)create a topic requires the following REST request (also with an X-Auth-Token):
POST /admin/topics
The body of the request contains a JSON document with parameters describing the topic to create. For example:
{
"name": "TOPICNAME",
"partitions": 2
}
In curl this would would be:
curl -k -v -H 'Content-Type: application/json' -H 'Accept: */*' \
-H 'X-Auth-Token: yourapikeyhere' \
-d '{ "name": "TOPICNAME", "partitions": 2 }' \
https://admin-endpoint-goes-here/admin/topics
I'm trying to use the REST API on Couchbase 2.2 and I'm finding two things that I cannot seem to do via REST:
Init a new cluster when no other nodes exist.
CLI version:
couchbase-cli cluster-init -u admin -p mypw -c localhost:8091 --cluster-init-ramsize=1024
Remove a healthy node from the cluster.
CLI version:
couchbase-cli rebalance -u admin -p mypw -c 10.10.1.10:8091 --server-remove=10.10.1.12
As for removing a node, I've tried:
curl -u admin:mypw -d otpNode=ns_1#10.10.1.12 \
http://10.10.1.10:8091/controller/ejectNode
Which returns: "Cannot remove active server."
I've also tried:
curl -s -u Administrator:myclusterpw \
-d 'ejectedNodes=ns_1%4010.10.1.12&knownNodes=ns_1%4010.10.1.10%2Cns_1%4010.10.1.11' \
http://10.10.1.10:8091/controller/rebalance
Which returns: {"mismatch":1} (presumably due to the node actually not being marked for ejection?)
Am I crazy, or are there no ways to do these things using curl?
I span up a two node cluster on aws (10.170.76.236 and 10.182.151.86), I was able to remove node 10.182.151.86 using the below curl request
curl -v -u Administrator:password -X POST 'http://10.182.151.86:8091/controller/rebalance' -d 'ejectedNodes=ns_1#10.182.151.86&knownNodes=ns_1#10.182.151.86,ns_1#10.170.76.236'
That removes the node and performs the rebalance leaving only '10.170.76.236' as the single node. Running this request below results in 'Cannot remove active server' as you have experienced.
curl -u Administrator:password -d otpNode=ns_1#10.170.76.236 http://10.170.76.236:8091/controller/ejectNode
This is because you can't remove the last node as you can't perform a rebalance, this issue is covered here http://www.couchbase.com/issues/browse/MB-7517
I left the real IP's in that I used so the curl requests are as clear as possible, I've terminated the nodes now though :)
Combo of:
curl -X POST -u admin:password -d username=Administrator \
-d password=letmein \
-d port=8091 \
http://localhost:8091/settings/web
and
curl -X POST -u admin:password -d memoryQuota=400 \
http://localhost:8091/pools/default
Ticket raised against this indicates that the ejectnode command itself won't work by design.
Server needs to either be pending or failover state to use that command seemingly