Setting kafka mirroring using Brooklin - apache-kafka

I am trying to test out Brooklin for mirroring data between kafka clusters. I am following the wiki https://github.com/linkedin/brooklin/wiki/mirroring-kafka-clusters
Unlike the wiki - I am trying to setup the mirroring between 2 different clusters. I am able to start the Brooklin process and the Datastream but I cannot manage to mirror messages. Brooklin is running on the source kafka cluster ATM. I am trying to mirror topic 'test'
The server.properties for brooklin is
############################# Server Basics #############################
brooklin.server.coordinator.cluster=brooklin-cluster
brooklin.server.coordinator.zkAddress=localhost:2181
brooklin.server.httpPort=32311
brooklin.server.connectorNames=file,test,kafkaMirroringConnector
brooklin.server.transportProviderNames=kafkaTransportProvider
brooklin.server.csvMetricsDir=/tmp/brooklin-example/
########################### Transport provider configs ######################
brooklin.server.transportProvider.kafkaTransportProvider.factoryClassName=com.linkedin.datastream.kafka.KafkaTransportProviderAdminFactory
brooklin.server.transportProvider.kafkaTransportProvider.bootstrap.servers=kafka-dest:9092
brooklin.server.transportProvider.kafkaTransportProvider.zookeeper.connect=kafka-dest:2181
brooklin.server.transportProvider.kafkaTransportProvider.client.id=datastream-producer
########################### File connector Configs ######################
brooklin.server.connector.file.factoryClassName=com.linkedin.datastream.connectors.file.FileConnectorFactory
brooklin.server.connector.file.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
brooklin.server.connector.file.strategy.maxTasks=1
########################### Test event producing connector Configs ######################
brooklin.server.connector.test.factoryClassName=com.linkedin.datastream.connectors.TestEventProducingConnectorFactory
brooklin.server.connector.test.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.LoadbalancingStrategyFactory
brooklin.server.connector.test.strategy.TasksPerDatastream = 4
########################### Kafka Mirroring connector Configs ######################
brooklin.server.connector.kafkaMirroringConnector.factoryClassName=com.linkedin.datastream.connectors.kafka.mirrormaker.KafkaMirrorMakerConnectorFactory
brooklin.server.connector.kafkaMirroringConnector.assignmentStrategyFactory=com.linkedin.datastream.server.assignment.BroadcastStrategyFactory
I then try to start the following Datastream;
bin/brooklin-rest-client.sh -o CREATE -u http://localhost:32311/ -n first-mirroring-stream -s "kafka://localhost:9092/test" -c kafkaMirroringConnector -t kafkaTransportProvider -m '{"owner":"root","system.reuseExistingDestination":"false"}' 2>/dev/null
Trying to check the Datastream;
bin/brooklin-rest-client.sh -o READALL -u http://localhost:32311/ 2>/dev/null
[2020-10-14 05:55:45,087] INFO Creating RestClient for http://localhost:32311/ with {}, count=1 (com.linkedin.datastream.DatastreamRestClientFactory)
[2020-10-14 05:55:45,113] INFO The service 'null' has been assigned to the ChannelPoolManager with key 'noSpecifiedNamePrefix 1138266797 ' (com.linkedin.r2.transport.http.client.HttpClientFactory)
[2020-10-14 05:55:45,215] INFO DatastreamRestClient created with retryPeriodMs=6000 retryTimeoutMs=90000 (com.linkedin.datastream.DatastreamRestClient)
[2020-10-14 05:55:45,502] INFO getAllDatastreams took 272 ms (com.linkedin.datastream.DatastreamRestClient)
{
"name" : "first-mirroring-stream",
"connectorName" : "kafkaMirroringConnector",
"transportProviderName" : "kafkaTransportProvider",
"source" : {
"connectionString" : "kafka://localhost:9092/test"
},
"Status" : "READY",
"destination" : {
"connectionString" : "kafka://kafka-dest:9092/*"
},
"metadata" : {
"datastreamUUID" : "df081002-fc7b-4f3a-b1ce-016e879d4b29",
"group.id" : "first-mirroring-stream",
"owner" : "root",
"system.IsConnectorManagedDestination" : "true",
"system.creation.ms" : "1602665999603",
"system.destination.KafkaBrokers" : "kafka-dest:9092",
"system.reuseExistingDestination" : "false",
"system.taskPrefix" : "first-mirroring-stream"
}
}
After this is running I try to produce on the source and consume on the destination but I do not get any mirroring.
Does anyone have a clue what I'm missing/what I did wrong?
Thanks!

This was an issue on my end - I had a typo in the topic name configured for mirroring.

Related

Spark Job SUBMITTED but not RUNNING after submit via REST API

Following the instructions in this website, I'm trying to submit a job to Spark via REST API /v1/submissions.
I tried to submit SparkPi in the example:
$ ./create.sh
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20211212044718-0003",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20211212044718-0003",
"success" : true
}
$ ./status.sh driver-20211212044718-0003
{
"action" : "SubmissionStatusResponse",
"driverState" : "SUBMITTED",
"serverSparkVersion" : "3.1.2",
"submissionId" : "driver-20211212044718-0003",
"success" : true
}
create.sh:
curl -X POST http://172.17.197.143:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"appResource": "/home/ruc/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar",
"sparkProperties": {
"spark.master": "spark://172.17.197.143:7077",
"spark.driver.memory": "1g",
"spark.driver.cores": "1",
"spark.app.name": "REST API - PI",
"spark.jars": "/home/ruc/spark-3.1.2/examples/jars/spark-examples_2.12-3.1.2.jar",
"spark.driver.supervise": "true"
},
"clientSparkVersion": "3.1.2",
"mainClass": "org.apache.spark.examples.SparkPi",
"action": "CreateSubmissionRequest",
"environmentVariables": {
"SPARK_ENV_LOADED": "1"
},
"appArgs": [
"400"
]
}'
status.sh:
export DRIVER_ID=$1
curl http://172.17.197.143:6066/v1/submissions/status/$DRIVER_ID
But when I try to get the status of the job (even after a few minutes), I got a "SUBMITTED" rather than "RUNNING" or "FINISHED".
Then I looked up the log and found that
21/12/12 04:47:18 INFO master.Master: Driver submitted org.apache.spark.deploy.worker.DriverWrapper
21/12/12 04:47:18 WARN master.Master: Driver driver-20211212044718-0003 requires more resource than any of Workers could have.
# ...
21/12/12 04:49:02 WARN master.Master: Driver driver-20211212044718-0003 requires more resource than any of Workers could have.
However, in my spark-env.sh, I have
export SPARK_WORKER_MEMORY=10g
export SPARK_WORKER_CORES=2
I have no idea what happened. How can I make it run normally?
Since you've checked resources and You have enough. It might be network issue. executor maybe cannot connect back to driver program. Allow traffic on both master and workers.

Cassandra Sink Connector for Confluent Platform

I am trying to run Cassandra sink connector for confluent platform.The cassandra-sink.json file is as below :
{
"name" : "cassandra-sink",
"config" : {
"connector.class" : "io.confluent.connect.cassandra.CassandraSinkConnector",
"tasks.max" : "1",
"topics" : "topic1",
"cassandra.contact.points" : "127.0.0.1",
"cassandra.keyspace" : "test",
"confluent.topic.bootstrap.servers": "127.0.0.1:9092",
"cassandra.write.mode" : "Update",
"connect.cassandra.port":"127.0.0.1:9042"
}
}
I downloaded confluent-hub install confluentinc/kafka-connect-cassandra:latest as per the link.
I am able to load the file but when i check the status i get the below error. I am unable to figure out what the issue is.
FAILED worker_id:127.0.0.1:8083,trace:com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed
com.datastax.driver.core.exceptions.TransportException: [/127.0.0.1:9042] Cannot connect
com.datastax.driver.core.ControlConnection.reconnectInternal
com.datastax.driver.core.ControlConnection.connect
com.datastax.driver.core.Cluster$Manager.negotiateProtocolVersionAndConnect
com.datastax.driver.core.Cluster$Manager.init
com.datastax.driver.core.Cluster.init
com.datastax.driver.core.SessionManager.initAsync
com.datastax.driver.core.SessionManager.executeAsync
com.datastax.driver.core.AbstractSession.execute
io.confluent.connect.cassandra.CassandraSessionImpl.executeStatement
io.confluent.connect.cassandra.CassandraSinkConnector.doStart
io.confluent.connect.cassandra.CassandraSinkConnector.start
org.apache.kafka.connect.runtime.WorkerConnector.doStart
org.apache.kafka.connect.runtime.WorkerConnector.start
org.apache.kafka.connect.runtime.WorkerConnector.transitionTo
org.apache.kafka.connect.runtime.Worker.startConnector
org.apache.kafka.connect.runtime.distributed.DistributedHerder.startConnector
org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1300
org.apache.kafka.connect.runtime.distributed.DistributedHerder$14
org.apache.kafka.connect.runtime.distributed.DistributedHerder$14
java.util.concurrent.FutureTask.run java.util.concurrent.ThreadPoolExecutor.runWorker
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run
Please guide.

Can i use replica set name to connect via mongo-connector

I would like to know, is there a way we can replicate from one mongo replica set to another via mongo-connector? As per mongo documentation we can connect two mongo instances via mongo-connector by using a command as in the example below, but I would like to pass replica set name or use a configuration file instead of passing server:port name in command line.
Mongo Connector can replicate from one MongoDB replica set or sharded cluster to another using the Mongo DocManager. The most basic usage is like the following:
mongo-connector -m localhost:27017 -t localhost:37017 -d mongo_doc_manager
I also tried config.json option by creating below config.json file but it has failed.
{
"__comment__": "Configuration options starting with '__' are disabled",
"__comment__": "To enable them, remove the preceding '__'",
"mainAddress": "localhost:27017",
"oplogFile": "C:\Dev\mongodb\mongo-connector\oplog.timestamp",
"verbosity": 2,
"continueOnError": false,
"logging": {
"type": "file",
"filename": "C:\Dev\mongodb\mongo-connector\mongo-connector.log",
"__rotationWhen": "D",
"__rotationInterval": 1,
"__rotationBackups": 10,
"__type": "syslog"
},
"docManagers": [
{
"docManager": "mongo_doc_manager",
"targetURL": "localhost:37010",
"__autoCommitInterval": null
}
]
}
yes its possible to connect to a replica set or a shard server using mongo connector.
{
mongo-connector -m <mongodb server hostname>:<replica set port> \
-t <replication endpoint URL, e.g. http://localhost:8983/solr> \
-d <name of doc manager, e.g., solr_doc_manager>
}
you can also also pass a connection string to the mongo-connector such as
{
mongo connector -m mongodb://db1.example.net,db2.example.net:2500/?replicaSet=test&connectTimeoutMS=300000
}
to specify specifc config files you can use
{ mongo-connector -c config.json }
where config.json is your config file.
I'm able to resolve my issue by entering backslash '\' for my windows directory path.Here is my updated config file for reference. Thanks to ShaneHarveyNot able to use Configuration file for connecting to mongo-connector
{
"__comment__": "Configuration options starting with '__' are disabled",
"__comment__": "To enable them, remove the preceding '__'",
"mainAddress": "localhost:27017",
"oplogFile": "C:\\Dev\\mongodb\\mongo-connector\\oplog.timestamp",
"noDump": false,
"batchSize": -1,
"verbosity": 2,
"continueOnError": false,
"logging": {
"type": "file",
"filename": "C:\\Dev\\mongodb\\mongo-connector\\mongo-connector.log",
"__format": "%(asctime)s [%(levelname)s] %(name)s:%(lineno)d - %(message)s",
"__rotationWhen": "D",
"__rotationInterval": 1,
"__rotationBackups": 10,
"__type": "syslog",
"__host": "localhost:27017"
},
"docManagers": [
{
"docManager": "mongo_doc_manager",
"targetURL": "localhost:37017",
"__autoCommitInterval": null
}
]
}

Fiware cygnus: no data have been persisted in mongo DB

I am trying to use cygnus with Mongo DB, but no data have been persisted in the data base.
Here is the notification got in cygnus:
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Starting transaction (1437482681-118-0000000000)
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Received data ({ "subscriptionId" : "55a73819d0c457bb20b1d467", "originator" : "localhost", "contextResponses" : [ { "contextElement" : { "type" : "enocean", "isPattern" : "false", "id" : "enocean:myButtonA", "attributes" : [ { "name" : "ButtonValue", "type" : "", "value" : "ON", "metadatas" : [ { "name" : "TimeInstant", "type" : "ISO8601", "value" : "2015-07-20T21:29:56.509293Z" } ] } ] }, "statusCode" : { "code" : "200", "reasonPhrase" : "OK" } } ]})
15/07/21 14:48:01 INFO handlers.OrionRestHandler: Event put in the channel (id=1454120446, ttl=10)
Here is my agent configuration:
cygnusagent.sources = http-source
cygnusagent.sinks = OrionMongoSink
cygnusagent.channels = mongo-channel
#=============================================
# source configuration
# channel name where to write the notification events
cygnusagent.sources.http-source.channels = mongo-channel
# source class, must not be changed
cygnusagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
# listening port the Flume source will use for receiving incoming notifications
cygnusagent.sources.http-source.port = 5050
# Flume handler that will parse the notifications, must not be changed
cygnusagent.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.OrionRestHandler
# URL target
cygnusagent.sources.http-source.handler.notification_target = /notify
# Default service (service semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service = def_serv
# Default service path (service path semantic depends on the persistence sink)
cygnusagent.sources.http-source.handler.default_service_path = def_servpath
# Number of channel re-injection retries before a Flume event is definitely discarded (-1 means infinite retries)
cygnusagent.sources.http-source.handler.events_ttl = 10
# Source interceptors, do not change
cygnusagent.sources.http-source.interceptors = ts gi
# TimestampInterceptor, do not change
cygnusagent.sources.http-source.interceptors.ts.type = timestamp
# GroupinInterceptor, do not change
cygnusagent.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.GroupingInterceptor$Builder
# Grouping rules for the GroupingInterceptor, put the right absolute path to the file if necessary
# See the doc/design/interceptors document for more details
cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file = /home/egm_demo/usr/fiware-cygnus/conf/grouping_rules.conf
# ============================================
# OrionMongoSink configuration
# sink class, must not be changed
cygnusagent.sinks.mongo-sink.type = com.telefonica.iot.cygnus.sinks.OrionMongoSink
# channel name from where to read notification events
cygnusagent.sinks.mongo-sink.channel = mongo-channel
# FQDN/IP:port where the MongoDB server runs (standalone case) or comma-separated list of FQDN/IP:port pairs where the MongoDB replica set members run
cygnusagent.sinks.mongo-sink.mongo_hosts = 127.0.0.1:27017
# a valid user in the MongoDB server (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_username =
# password for the user above (or empty if authentication is not enabled in MongoDB)
cygnusagent.sinks.mongo-sink.mongo_password =
# prefix for the MongoDB databases
#cygnusagent.sinks.mongo-sink.db_prefix = kura
# prefix pro the MongoDB collections
#cygnusagent.sinks.mongo-sink.collection_prefix = button
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# ============================================
# mongo-channel configuration
# channel type (must not be changed)
cygnusagent.channels.mongo-channel.type = memory
# capacity of the channel
cygnusagent.channels.mongo-channel.capacity = 1000
# amount of bytes that can be sent per transaction
cygnusagent.channels.mongo-channel.transactionCapacity = 100
Here is my rule :
{
"grouping_rules": [
{
"id": 1,
"fields": [
"button"
],
"regex": ".*",
"destination": "kura",
"fiware_service_path": "/kuraspath"
}
]
}
Any ideas of what I have missed? Thanks in advance for your help!
This configuration parameter is wrong:
cygnusagent.sinks = OrionMongoSink
According to your configuration, it must be mongo-sink (I mean, you are configuring a Mongo sink named mongo-sink when you configure lines such as cygnusagent.sinks.mongo-sink.type).
In addition, I would recommend you to not using the grouping rules feature; it is an advanced feature about sending the data to a collection different than the default one, and in a first stage I would play with the default behaviour. Thus, my recommendation is to leave the path to the file in cygnusagent.sources.http-source.interceptors.gi.grouping_rules_conf_file, but comment all the JSON within it :)

mongod not logging after log rotate

Originally, mongod is running and log to file shard1.log. And I run mongod again and failed.
In the shard1.log, there are only infomation about the failure of second command. The origin log is renamed to shard1.log.2015-02-01T07-41-46.
But the problem is the origin mongod not logging any more.
I check the opened file of origin mongod.
ls -l /proc/7980/fd | grep -v socket | grep log
l-wx------ 1 mongodb mongodb 64 2月 6 06:05 1 -> /data/mongodb/log/shard1.log.2015-02-01T07-41-46
The shard1.log.2015-02-01T07-41-46 stayed unchanged.
So how can I make the origin mongod log again?
Edit
here is my mongo config file
logpath=/data/mongodb/log/shard1.log
fork=true
dbpath=/data/mongodb/db/shard1
pidfilepath=/data/mongodb/pid/shard1.pid
shardsvr=true
replSet=shard1/10.0.0.1:27017
port=27017
bind_ip=10.0.0.2
the output of command db.serverCmdLineOpts
{
"argv" : [
"/data/mongodb/bin/mongod",
"-f",
"/data/mongodb/conf/shard1.conf"
],
"parsed" : {
"bind_ip" : "10.0.0.2",
"config" : "/data/mongodb/conf/shard1.conf",
"dbpath" : "/data/mongodb/db/shard1",
"fork" : "true",
"logpath" : "/data/mongodb/log/shard1.log",
"pidfilepath" : "/data/mongodb/pid/shard1.pid",
"port" : 27017,
"replSet" : "shard1/10.0.0.1:27017",
"shardsvr" : "true"
},
"ok" : 1
}