Unable to override lagom kafka parameters - apache-kafka

I created a normal java project and put all dependencies of lagom kafka client on classpath , then in source folder i put the application.conf
Content of application.conf
lagom.broker.kafka {
service-name = ""
brokers = "127.0.0.1:9092"
}
while running the application service-name = "" should be used (so that my broker path could be used, rather than discovering), but it was not working
while debugging i found that in KafkaConfig class service-name comes out to be "kafka_native".
I found that while creating KafkaConfig , conf object which is coming dosen't have my application.conf in its origin
After this i tried overriding them using vm parameters like this:
-Dlagom.broker.kafka.service-name=""
-Dlagom.broker.kafka.brokers="127.0.0.1:9092"
-Dakka.kafka.consumer.kafka-clients.auto.offset.reset="earliest"
and it worked.
Can somebody explain why overriding in application conf not working
This is how i am subscribing to topic
import java.net.URI;
import java.util.concurrent.CompletableFuture;
import com.ameyo.ticketing.ticket.api.TicketingService;
import com.ameyo.ticketing.ticket.api.events.TicketEvent;
import com.lightbend.lagom.javadsl.api.broker.Topic;
import com.lightbend.lagom.javadsl.client.integration.LagomClientFactory;
import com.typesafe.config.ConfigFactory;
import akka.Done;
import akka.stream.javadsl.Flow;
/**
*
*/
public class Main {
public static void main(String[] args) {
String brokers = ConfigFactory.load().getString("lagom.broker.kafka.brokers");
System.out.println("Initial Value for Brokers " + brokers);
LagomClientFactory clientFactory = LagomClientFactory.create("legacy-system", Main.class.getClassLoader());
TicketingService ticketTingService = clientFactory.createClient(TicketingService.class,
URI.create("http://localhost:11000"));
Topic<TicketEvent> ticketEvents = ticketTingService.ticketEvents();
ticketEvents.subscribe().withGroupId("nya13").atLeastOnce(Flow.<TicketEvent> create().mapAsync(1, e -> {
System.out.println("kuch to aaya");
return CompletableFuture.completedFuture(Done.getInstance());
}));
try {
Thread.sleep(1000000000);
} catch (InterruptedException e1) {
}
}
}

Change configuration to
akka{
lagom.broker.kafka {
service-name = ""
brokers = "127.0.0.1:9092"
}
}
and it worked

Related

Vert.x ConfigRetriever.listen does not get triggered when the config file changes

I have a simple vertical that I am using to test the ConfigRetriever.listen for changes.
Using Vert.x version 4.3.4
import io.vertx.config.ConfigRetriever;
import io.vertx.config.ConfigRetrieverOptions;
import io.vertx.config.ConfigStoreOptions;
import io.vertx.core.AbstractVerticle;
import io.vertx.core.json.JsonObject;
public class MyVerticle extends AbstractVerticle {
#Override
public void start() {
ConfigStoreOptions fileStore = new ConfigStoreOptions().setType("file")
.setFormat("yaml")
.setConfig(new JsonObject().put("path", "config.yaml"));
ConfigRetrieverOptions options = new ConfigRetrieverOptions().setScanPeriod(1000)
.addStore(fileStore);
ConfigRetriever retriever = ConfigRetriever.create(vertx, options);
retriever.listen(change -> {
JsonObject previous = change.getPreviousConfiguration();
System.out.println(previous);
JsonObject changedConf = change.getNewConfiguration();
System.out.println(changedConf);
});
}
}
[Edit] The config file is under src/main/resource
When I run this, I get an output of the before as empty and after as config in my yaml file.
{}
{"bridgeservice":{"eb_address":"xyz","path":"/api/v1/aaa/","port":80}}
The problem is when I change the value in the yaml config file nothing happens. I expect the changes to get printed. When I am running this in the debugger I see
Thread [vert.x-internal-blocking-0] (Running)
..
..
..
Thread [vert.x-internal-blocking-19] (Running)
When I put the following just before the retriever.listen() , I get the succeeded... line printed and nothing from the listen method even after changing the config file values.
retriever.getConfig(ar -> {
if (ar.succeeded()) {
System.out.println("succeeded :" + ar.result());
}
else {
ar.cause()
.printStackTrace();
}
});
May be related to SO having-trouble-listen-vert-x-config-change
[Edit] The config file is under src/main/resource
When I moved my config file from resources to a folder cfg at the same level as src the Verticle behaved as it should and picked up config changes. I don't know why, maybe it's an eclipse environment thing.

Kafka Connect using REST API with Strimzi with kind: KafkaConnector

I'm trying to use Kafka Connect REST API for managing connectors, for simplicity consider the following pause implementation:
def pause(): Unit = {
logger.info(s"pause() Triggered")
val response = HttpClient.newHttpClient.send({
HttpRequest
.newBuilder(URI.create(config.connectUrl + s"/connectors/${config.connectorName}/pause"))
.PUT(BodyPublishers.noBody)
.timeout(Duration.ofMillis(config.timeout.toMillis.toInt))
.build()
}, BodyHandlers.ofString)
if (response.statusCode() != HTTPStatus.Accepted) {
throw new Exception(s"Could not pause connector: ${response.body}")
}
}
Since I'm using KafkaConnector as a resource, I cannot use Kafka Connect REST API because the connector operator has the KafkaConnetor resources as its single source of truth, manual changes such as pause made directly using the Kafka Connect REST API are reverted by the Cluster Operator.
So to pause the connector I need to edit the resource in some way.
I'm struggling to change the logic of the current function, It will be great to have some practical examples of how to handle KafkaConnetor resources.
I check out the Using Strimzi doc but couldn't find any practical example
Thanks!
After help from #Jakub i managed to create my new client:
class KubernetesService(config: Configuration) extends StrictLogging {
private[this] val client = new DefaultKubernetesClient(Config.autoConfigure(config.connectorContext))
def setPause(pause: Boolean): Unit = {
logger.info(s"[KubernetesService] - setPause($pause) Triggered")
val connector = getConnector()
connector.getSpec.setPause(pause)
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).replace(connector)
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(connector => {
connector != null &&
connector.getSpec.getPause == pause && {
val desiredState = if (pause) "Paused" else "Running"
connector.getStatus.getConditions.stream().anyMatch(_.getType.equalsIgnoreCase(desiredState))
}
}, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def delete(): Unit = {
logger.info(s"[KubernetesService] - delete() Triggered")
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).delete
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(_ == null, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def create(oldKafkaConnect: KafkaConnector): Unit = {
logger.info(s"[KubernetesService] - create(${oldKafkaConnect.getMetadata}) Triggered")
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).create(oldKafkaConnect)
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(connector => {
connector != null &&
connector.getStatus.getConditions.stream().anyMatch(_.getType.equalsIgnoreCase("Running"))
}, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def getConnector(): KafkaConnector = {
logger.info(s"[KubernetesService] - getConnector() Triggered")
Try {
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).get
} match {
case Success(connector) => connector
case Failure(_: NullPointerException) => throw new NullPointerException(s"Failure on getConnector(${config.connectorName}) on ns: ${config.connectorNamespace}, context: ${config.connectorContext}")
case Failure(exception) => throw exception
}
}
}
To pause the connector, you can edit the KafkaConnector resource and set the pause field in .spec to true (see the docs). There are several options how you can do it. You can use kubectl and either apply the new YAML from file (kubectl apply) or do it interactively using kubectl edit.
If you want to do it programatically, you will need to use a Kubernetes client to edit the resource. In Java, you can also use the api module of Strimzi which has all the structures for editing the resources. I put together a simple example for pausing the Kafka connector in Java using the Fabric8 Kubernetes client and the api module:
package cz.scholz.strimzi.api.examples;
import io.fabric8.kubernetes.client.DefaultKubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.dsl.MixedOperation;
import io.fabric8.kubernetes.client.dsl.Resource;
import io.strimzi.api.kafka.Crds;
import io.strimzi.api.kafka.KafkaConnectorList;
import io.strimzi.api.kafka.model.KafkaConnector;
public class PauseConnector {
public static void main(String[] args) {
String namespace = "myproject";
String crName = "my-connector";
KubernetesClient client = new DefaultKubernetesClient();
MixedOperation<KafkaConnector, KafkaConnectorList, Resource<KafkaConnector>> op = Crds.kafkaConnectorOperation(client);
KafkaConnector connector = op.inNamespace(namespace).withName(crName).get();
connector.getSpec().setPause(true);
op.inNamespace(namespace).withName(crName).replace(connector);
client.close();
}
}
(See https://github.com/scholzj/strimzi-api-examples for the full project)
I'm not a Scala users - but I assume it should be usable from Scala as well, but I leave rewriting it from Java to Scala to you.

Kafka not receiving message from external producer

Update #2 - I believe I found the solution and I am answering this for completeness. It seems that I need to have the following configurations set with my instances public ip address and port 9092.
advertised.host.name
advertised.port
I am running a two node Kafka server running version 0.9.0.1. I have gone through the quick-start and tested that I can send and receive messages internally, as in producing a message on the first Kafka server node and consuming it on the second Kafka server node. Both nodes are on the same network.
I then tried to produce some messages from an external box outside of the network that the Kafka nodes were on. I have ensured the appropriate ports are open and tested that I can hit both boxes from my producer machine using telnet on port 9092.
Everything seems to work fine; my small Java app will send the messages without error, but the consumer never receives anything. I have checked the Kafka logs and nothing is there. Is there any additional configurations that need to be set in the server.properties file in order for Kafka servers to consume from producers outside of its internal network?
Updated ** the issue seems to be isolated on my producer. When Kafka is running locally on the same machine that is running the Java producer code it works as expected. When I run the Java code and produce the same messages to a external machine, the message is not included in the producer.send call.
I have verified this by running the following command.
sudo tcpdump -n -i en0 -xX port 9092
This monitors the traffic from my producer machine being sent out on port 9092. I can make out the topic, but the message is clearly not there.
Here is the code I am using for the producer:
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.storm.shade.org.yaml.snakeyaml.Yaml;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.*;
import java.util.Map;
import java.util.Properties;
import static constants.Constants.*;
public class TestProducer {
// variable declarations
private KafkaProducer<String, String> _producer;
private Properties _props;
private String _topic;
public TestProducer(String configPath, String topic) {
this._props = initialize(configPath);
this._producer = new KafkaProducer<>(_props);
this._topic = topic;
}
private Properties initialize(String configPath) {
Yaml yaml = new Yaml();
Properties props = new Properties();
try {
InputStream is = new FileInputStream(new File(configPath));
// Parse the YAML file and return the output as a series of Maps and Lists
Map<String, Object> config = (Map<String, Object>) yaml.load(is);
props.put("bootstrap.servers", config.get(PROD_BOOTSTRAP_SERVERS));
props.put("acks", config.get(PROD_ACKS));
props.put("retries", config.get(PROD_RETRIES));
props.put("batch.size", config.get(PROD_BATCH_SIZE));
props.put("linger.ms", config.get(PROD_LINGER_MS));
props.put("buffer.memory", config.get(PROD_BUFFER_MEMORY));
props.put("key.serializer", config.get(PROD_KEY_SERIALIZER));
props.put("value.serializer", config.get(PROD_VALUE_SERIALIZER));
} catch( FileNotFoundException e ){
LOG.error("FileNotFoundException: could not initialize Properties");
}
return props;
}
public void produce(String message) {
try {
//verify that we have supplied a topic
if( _topic != null && !_topic.isEmpty() ) {
System.out.println("Producing message:" + message);
_producer.send(new ProducerRecord<String, String>(_topic, message));
}
} catch (Throwable throwable) {
System.out.printf("%s", throwable.getStackTrace());
}
}
public void close() {
_producer.close();
}
}

How to run Hadoop jobs in Amazon EMR using Eclipse?

I have followed the tutorial given by Amazon here but it seems that my code failed to run.
The error that I got:
Exception in thread "main" java.lang.Error: Unresolved compilation problem:
The method withJobFlowRole(String) is undefined for the type AddJobFlowStepsRequest
at main.main(main.java:38)
My full code:
import java.io.IOException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.PropertiesCredentials;
import com.amazonaws.services.elasticmapreduce.*;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsRequest;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsResult;
import com.amazonaws.services.elasticmapreduce.model.HadoopJarStepConfig;
import com.amazonaws.services.elasticmapreduce.model.StepConfig;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;
public class main {
public static void main(String[] args) {
AWSCredentials credentials = null;
try {
credentials = new PropertiesCredentials(
main.class.getResourceAsStream("AwsCredentials.properties"));
} catch (IOException e1) {
System.out.println("Credentials were not properly entered into AwsCredentials.properties.");
System.out.println(e1.getMessage());
System.exit(-1);
}
AmazonElasticMapReduce client = new AmazonElasticMapReduceClient(credentials);
// predefined steps. See StepFactory for list of predefined steps
StepConfig hive = new StepConfig("Hive", new StepFactory().newInstallHiveStep());
// A custom step
HadoopJarStepConfig hadoopConfig1 = new HadoopJarStepConfig()
.withJar("s3://mywordcountbuckett/binary/WordCount.jar")
.withMainClass("com.my.Main1") // optional main class, this can be omitted if jar above has a manifest
.withArgs("--verbose"); // optional list of arguments
StepConfig customStep = new StepConfig("Step1", hadoopConfig1);
AddJobFlowStepsResult result = client.addJobFlowSteps(new AddJobFlowStepsRequest()
.withJobFlowRole("jobflow_role")
.withServiceRole("service_role")
.withSteps(hive, customStep));
System.out.println(result.getStepIds());
}
}
What could be the reason that the code is not running ?
Are there any tutorials based on the latest version ?

What is the most efficient way of moving data out of Hive and into MongoDB?

Is there an elegant, easy and fast way to move data out of Hive into MongoDB?
You can do the export with the Hadoop-MongoDB connector. Just run the Hive query in your job's main method. This output will then be used by the Mapper in order to insert the data into MongoDB.
Example:
Here I'm inserting a semicolon separated text file (id;firstname;lastname) to a MongoDB
collection using a simple Hive query :
import java.io.IOException;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import com.mongodb.hadoop.MongoOutputFormat;
import com.mongodb.hadoop.io.BSONWritable;
import com.mongodb.hadoop.util.MongoConfigUtil;
public class HiveToMongo extends Configured implements Tool {
private static class HiveToMongoMapper extends
Mapper<LongWritable, Text, IntWritable, BSONWritable> {
//See: https://issues.apache.org/jira/browse/HIVE-634
private static final String HIVE_EXPORT_DELIMETER = '\001' + "";
private IntWritable k = new IntWritable();
private BSONWritable v = null;
#Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String [] split = value.toString().split(HIVE_EXPORT_DELIMETER);
k.set(Integer.parseInt(split[0]));
v = new BSONWritable();
v.put("firstname", split[1]);
v.put("lastname", split[2]);
context.write(k, v);
}
}
public static void main(String[] args) throws Exception {
try {
Class.forName("org.apache.hadoop.hive.jdbc.HiveDriver");
}
catch (ClassNotFoundException e) {
System.out.println("Unable to load Hive Driver");
System.exit(1);
}
try {
Connection con = DriverManager.getConnection(
"jdbc:hive://localhost:10000/default");
Statement stmt = con.createStatement();
String sql = "INSERT OVERWRITE DIRECTORY " +
"'hdfs://localhost:8020/user/hive/tmp' select * from users";
stmt.executeQuery(sql);
}
catch (SQLException e) {
System.exit(1);
}
int res = ToolRunner.run(new Configuration(), new HiveToMongo(), args);
System.exit(res);
}
#Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
Path inputPath = new Path("/user/hive/tmp");
String mongoDbPath = "mongodb://127.0.0.1:6900/mongo_users.mycoll";
MongoConfigUtil.setOutputURI(conf, mongoDbPath);
/*
Add dependencies to distributed cache via
DistributedCache.addFileToClassPath(...) :
- mongo-hadoop-core-x.x.x.jar
- mongo-java-driver-x.x.x.jar
- hive-jdbc-x.x.x.jar
HadoopUtils is an own utility class
*/
HadoopUtils.addDependenciesToDistributedCache("/libs/mongodb", conf);
HadoopUtils.addDependenciesToDistributedCache("/libs/hive", conf);
Job job = new Job(conf, "HiveToMongo");
FileInputFormat.setInputPaths(job, inputPath);
job.setJarByClass(HiveToMongo.class);
job.setMapperClass(HiveToMongoMapper.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(MongoOutputFormat.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setNumReduceTasks(0);
job.submit();
System.out.println("Job submitted.");
return 0;
}
}
One drawback is that a 'staging area' (/user/hive/tmp) is needed to store the intermediate Hive output. Furthermore as far as I know the Mongo-Hadoop connector doesn't support upserts.
I'm not quite sure but you can also try to fetch the data from Hive without running
hiveserver which exposes a Thrift service so that you can probably save some overhead.
Look at the source code of Hive's org.apache.hadoop.hive.cli.CliDriver#processLine(String line, boolean allowInterupting) method which actually executes the query. Then you can hack together something like this:
...
LogUtils.initHiveLog4j();
CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
ss.in = System.in;
ss.out = new PrintStream(System.out, true, "UTF-8");
ss.err = new PrintStream(System.err, true, "UTF-8");
SessionState.start(ss);
Driver qp = new Driver();
processLocalCmd("SELECT * from users", qp, ss); //taken from CliDriver
...
Side notes:
There's also a hive-mongo connector implementation you might also check.
It's also worth having a look at the implementation of the Hive-HBase connector to get some idea if you want to implement a similar one for MongoDB.
Have you looked into Sqoop? It's supposed to make it very simple to move data between Hadoop and SQL/NoSQL databases. This article also gives an example of using it with Hive.
Take a look at the hadoop-MongoDB connector project :
http://api.mongodb.org/hadoop/MongoDB%2BHadoop+Connector.html
"This connectivity takes the form of allowing both reading MongoDB data into Hadoop (for use in MapReduce jobs as well as other components of the Hadoop ecosystem), as well as writing the results of Hadoop jobs out to MongoDB."
not sure if it will work for your use case but it's worth looking at.