Storm-Kafka-client from Storm 1.0.1 - apache-kafka

Based on the Storm documentation supported implementation of KafkaSpout is based on the old consumer API. I noticed the external package has another implementation named storm-kafka-client.
https://github.com/apache/storm/tree/master/external/storm-kafka-client
It is unclear if the new client release in 1.0.1 is production ready. Does anyone have experience running it?

I posted the same question to the Storm mail list.
the new API is production ready. We should use 1.x branch.
I plan to test with
<!-- https://mvnrepository.com/artifact/org.apache.storm/storm-kafka-client -->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka-client</artifactId>
<version>1.0.1</version>
</dependency>
Will update on the progress.

Below Code works for me fine!!!
public TopologyBuilder myTopology() {
TopologyBuilder builder = new TopologyBuilder();
try {
KafkaSpoutConfig<String, String> kafkaSpoutConfig = getKafkaSpoutConfig("KAFKA_IP:9092", KAFKA_TOPIC);
KafkaSpout kafkaSpout = new KafkaSpout<>(kafkaSpoutConfig);
builder.setSpout("kafkaSpout", kafkaSpout, 2 * 2);
builder.setBolt("Bolt-1", new TestBolt(), parallelism).shuffleGrouping("kafkaSpout", KAFKA_TOPIC);
} catch (Exception ex) {
}
return builder;
}
Configure Spout.
protected KafkaSpoutConfig<String, String> getKafkaSpoutConfig(String bootstrapServers ,String topic) {
ByTopicRecordTranslator<String, String> trans = new ByTopicRecordTranslator<>(
(r) -> new Values(r.topic(), r.partition(), r.offset(), r.key(), r.value()),
new Fields("topic", "partition", "offset", "key", "value"), topic);
Builder<String, String> builder = KafkaSpoutConfig.builder(bootstrapServers, new String[]{topic});
return builder.setProp(ConsumerConfig.GROUP_ID_CONFIG, topic)
.setProcessingGuarantee(ProcessingGuarantee.AT_LEAST_ONCE)
.setRetry(getRetryService())
.setRecordTranslator(trans)
.setOffsetCommitPeriodMs(10_000)
.setFirstPollOffsetStrategy(UNCOMMITTED_EARLIEST)
.setMaxUncommittedOffsets(1000)
.build();
}
For configure failed messages retyr logic
protected KafkaSpoutRetryService getRetryService() {
return new KafkaSpoutRetryExponentialBackoff(TimeInterval.microSeconds(500),
TimeInterval.milliSeconds(2), Integer.MAX_VALUE, TimeInterval.seconds(10));
}

You can use following maven dependency for storm 1.1.0
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.1.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.9.0.0</version>
<exclusions>
<exclusion>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
You may face some more dependency issue which you can resolve by adding the required jars.
Also the dependency in java code will change from org.backtype.storm.XXXXX to org.apache.storm.XXXXX

Related

Spring boot data R2DBC requires transaction for read operations

Im trying to fetch a list of objects from the database using Spring boot Webflux with the postgres R2DBC driver, but I get an error saying:
value ignored org.springframework.transaction.reactive.TransactionContextManager$NoTransactionInContextException: No transaction in context Context1{reactor.onNextError.localStrategy=reactor.core.publisher.OnNextFailureStrategy$ResumeStrategy#7c18c255}
it seems all DatabaseClient operations requires to be wrap into a transaction.
I tried different combinations of the dependencies between spring-boot-data and r2db but didn't really work.
Version:
<spring-boot.version>2.2.0.RC1</spring-boot.version>
<spring-data-r2dbc.version>1.0.0.BUILD-SNAPSHOT</spring-data-r2dbc.version>
<r2dbc-releasetrain.version>Arabba-M8</r2dbc-releasetrain.version>
Dependencies:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-r2dbc</artifactId>
<version>${spring-data-r2dbc.version}</version>
</dependency>
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-postgresql</artifactId>
</dependency>
<dependency>
<groupId>org.postgresql</groupId>
<artifactId>postgresql</artifactId>
</dependency>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-bom</artifactId>
<version>${r2dbc-releasetrain.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
fun findAll(): Flux<Game> {
val games = client
.select()
.from(Game::class.java)
.fetch()
.all()
.onErrorContinue{ throwable, o -> System.out.println("value ignored $throwable $o") }
games.subscribe()
return Flux.empty()
}
#Table("game")
data class Game(#Id val id: UUID = UUID.randomUUID(),
#Column("guess") val guess: Int = Random.nextInt(500))
Github repo: https://github.com/odfsoft/spring-boot-guess-game/tree/r2dbc-issue
I expect read operations to not require #Transactional or to run the query without wrapping into the transactional context manually.
UPDATE:
After a few tries with multiple version I manage to find a combination that works:
<spring-data-r2dbc.version>1.0.0.BUILD-SNAPSHOT</spring-data-r2dbc.version>
<r2dbc-postgres.version>0.8.0.RC2</r2dbc-postgres.version>
Dependencies:
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-r2dbc</artifactId>
<version>${spring-data-r2dbc.version}</version>
</dependency>
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-postgresql</artifactId>
<version>${r2dbc-postgres.version}</version>
</dependency>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.r2dbc</groupId>
<artifactId>r2dbc-bom</artifactId>
<version>Arabba-RC2</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
it seems the versions notation went from 1.0.0.M7 to 0.8.x for r2dbc due to the following:
https://r2dbc.io/2019/05/13/r2dbc-0-8-milestone-8-released
https://r2dbc.io/2019/10/07/r2dbc-0-8-rc2-released
but after updating to the latest version a new problem appear which is that a transaction is required to run queries as follow:
Update configuration:
#Configuration
class PostgresConfig : AbstractR2dbcConfiguration() {
#Bean
override fun connectionFactory(): ConnectionFactory {
return PostgresqlConnectionFactory(
PostgresqlConnectionConfiguration.builder()
.host("localhost")
.port(5432)
.username("root")
.password("secret")
.database("game")
.build())
}
#Bean
fun reactiveTransactionManager(connectionFactory: ConnectionFactory): ReactiveTransactionManager {
return R2dbcTransactionManager(connectionFactory)
}
#Bean
fun transactionalOperator(reactiveTransactionManager: ReactiveTransactionManager) =
TransactionalOperator.create(reactiveTransactionManager)
}
Query:
fun findAll(): Flux<Game> {
return client
.execute("select id, guess from game")
.`as`(Game::class.java)
.fetch()
.all()
.`as`(to::transactional)
.onErrorContinue{ throwable, o -> System.out.println("value ignored $throwable $o") }
.log()
}
Disclaimer this is not mean to be used in production!! still before GA.

gRPC spring-boot-starter can not bind #GRpcGlobalInterceptor?

I m using
<dependency>
<groupId>org.lognet</groupId>
<artifactId>grpc-spring-boot-starter</artifactId>
<version>2.1.4</version>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>io.zipkin.brave</groupId>
<artifactId>brave-instrumentation-grpc</artifactId>
<version>4.13.1</version>
</dependency>
<dependency>
<groupId>io.zipkin.brave</groupId>
<artifactId>brave</artifactId>
<version>4.13.1</version>
</dependency>
and I want to use brave-instrumentation-grpc monitor my grpc server application. So I followed advice below:
https://github.com/LogNet/grpc-spring-boot-starter#interceptors-support
https://github.com/openzipkin/brave/tree/master/instrumentation/grpc
#Configuration
public class GrpcFilterConfig{
#GRpcGlobalInterceptor
#Bean
public ServerInterceptor globalInterceptor(){
Tracing tracing = Tracing.newBuilder().build();
GrpcTracing grpcTracing = GrpcTracing.create(tracing);
return grpcTracing.newServerInterceptor();
}
}
The question is : The GlobalInterceptor i defined does not bind to grpcserver.
debug deep insde the GRpcServerRunner.class, it seems that the code does not return beansWithAnnotation
Map<String, Object> beansWithAnnotation = this.applicationContext.getBeansWithAnnotation(annotationType);
beansWithAnnotation is null.
Is there something wrong with my usage?

Kafka-spark Streaming processing jobs synchronically

Im trying a simple test where i use Kafka-connect and spark
I wrote a custom kafka-connect that creates this source record
SourceRecord sr = new SourceRecord(null,
null,
destTopic,
Schema.STRING_SCHEMA,
cleanPath);
in the spark i receive this message like this
val kafkaConsumerParams = Map[String, String](
"metadata.broker.list" -> prop.getProperty("kafka_host"),
"zookeeper.connect" -> prop.getProperty("zookeeper_host"),
"group.id" -> prop.getProperty("kafka_group_id"),
"schema.registry.url" -> prop.getProperty("schema_registry_url"),
"auto.offset.reset" -> prop.getProperty("auto_offset_reset")
)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaConsumerParams, topicsSet)
val ds = messages.foreachRDD(rdd => {
val toPrint = rdd.map(t => {
val file_path = t._2
val startTime = DateTime.now()
Thread.sleep(1000 * 60)
1
}).sum()
LogUtils.getLogger(classOf[DeviceManager]).info(" toPrint = " + toPrint +" (number of flows calculated)")
})
}
when i use the connector to send multiple message to the desired topic ( in my test it had 6 partitions)
The sleep thread gets all the messages, but preforms them synchronically instead of asynchronically.
When i create a simple test producer, the sleeps are done asynchronically.
I Also created 2 simple consumers, and tried both the connector and a producer, and both task were consumed asynchronically
which means my problems lays with the way the spark is receiving the messages sent from the connector.
I cant figure why the tasks are not acting the same way as they do when i send it from a producer.
i even printed the record the spark recieves and they are exactly the same
producer sent record
1: {partition=2, offset=11, value=something, key=null}
2: {partition=5, offset=9, value=something2, key=null}
connect sent record
1: {partition=3, offset=9, value=something, key=null}
the versions used in my projects are
<scala.version>2.11.7</scala.version>
<confluent.version>4.0.0</confluent.version>
<kafka.version>1.0.0</kafka.version>
<java.version>1.8</java.version>
<spark.version>2.0.0</spark.version>
dependencies
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>${confluent.version}</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-schema-registry-client</artifactId>
<version>${confluent.version}</version>
</dependency>
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.8.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-RC1</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.8.0</version>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>${confluent.version}</version>
<scope>${global.scope}</scope>
</dependency>
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-connect-avro-converter</artifactId>
<version>${confluent.version}</version>
<scope>${global.scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>connect-api</artifactId>
<version>${kafka.version}</version>
</dependency>
We cannot run Spark-Kafka streaming jobs asynchronously. But we can run them in parallel, as Kafka consumer(s) do. For that, we need to set following configuration in SparkConf():
sparkConf.set("spark.streaming.concurrentJobs","4")
By default, its value is "1". But we can override it to a higher value.
I hope this helps!

MongoDB Scala - query document for a specific field value

So I know that in Mongo Shell, you use dot notation to get the field you want in any document.
How is dot notation achieved in MongoDB Scala. I'm confused as to how it works. Here is the code that fetches a document from a collection:
val record = collection.find().projection(fields(include("offset"), excludeId())).limit(1)
EDIT:
I'm trying to work on a mechanism to basically re-consume Kafka records at a point where the consumer was shutdown. To do this, I store my kafka records in an external database, and then try to fetch the most recent offset from there and start consuming from that point. Here is my Scala method that should do that:
def getLatestCommitOffsetFromDB(collectionName: String): Long = {
import com.mongodb.Block
import org.bson.Document
val printBlock = new Block[Document]() {
override def apply(document: Document): Unit = {
println(document.toJson)
}
}
import com.mongodb.async.SingleResultCallback
val callbackWhenFinished = new SingleResultCallback[Void]() {
override def onResult(result: Void, t: Throwable): Unit = {
System.out.println("Latest offset fetched from database.")
}
}
var obj: String = " "
try {
val record = collection.find().projection(fields(include("offset"), excludeId())).limit(1)
//TODO FIND A WAY TO GET THE VALUE AND STORE IT IN A VARIABLE
} catch {
case e: RuntimeException =>
logger.error(s"MongoDB Server Error : Unable to fetch data from collection : $collection")
logger.error(e.printStackTrace().toString())
}
obj.toLong
}
The problem isn't that I can fetch documents from Mongo, more-so that I'm trying to access a particular field in Mongo. The Document has four fields in it: topic, partition, message, and offset. I want to get the "offset" field and store that in a variable, so I can use it as a restarting point to re-consume Kafka records.
where do I go from there?
POM.xml
<?xml version="1.0" encoding="UTF-8"?>
http://maven.apache.org/xsd/maven-4.0.0.xsd">
4.0.0
<groupId>OffsetManagementPoC</groupId>
<artifactId>OffsetManagementPoC</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.12</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.25</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.2.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.5</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.5</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.6.5</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>casbah_2.12</artifactId>
<version>3.1.1</version>
<type>pom</type>
</dependency>
<dependency>
<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.mongodb.scala</groupId>
<artifactId>mongo-scala-driver_2.12</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-compiler</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.4.2</version>
</dependency>
<dependency>
<groupId>org.mongodb.scala</groupId>
<artifactId>mongo-scala-driver_2.11</artifactId>
<version>2.1.0</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>bson</artifactId>
<version>3.3.0</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-async</artifactId>
<version>3.4.3</version>
</dependency>
<dependency>
<groupId>org.mongodb.scala</groupId>
<artifactId>mongo-scala-bson_2.11</artifactId>
<version>2.1.0</version>
</dependency>
</dependencies>
You can modify your query this way:
import com.mongodb.MongoClient
import com.mongodb.client.MongoCollection
import com.mongodb.client.model.Projections
def getLatestCommitOffsetFromDB(
databaseName: String,
collectionName: String
): Long = {
val mongoClient = new MongoClient("localhost", 27017);
val collection =
mongoClient.getDatabase(databaseName).getCollection(collectionName)
val record = collection
.find()
.projection(
Projections
.fields(Projections.include("offset"), Projections.excludeId()))
.first
record.get("offset").asInstanceOf[Double].toLong
}
I think you were missing the com.mongodb.client.model.Projections imports in order to use fields, include and excludeId
I used first instead of limit(1) to make it easier to extract the result.
first returns a Document object on which you can call get to retrieve the value of the requested field.
But in fact, since you just want one record and one field, you can remove the projection!:
val record = collection.find().first
According to the documentation, collection.find() accepts a com.mongodb.DBObject
One of the implementations of that interface that you can use is BasicDBObject which is basically like a mutable.Map[String, Object]. You can use the constructor which accepts a map like:
val query = new com.mongodb.BasicDBObject(Map(
"foo.bar" -> "value1"
"bar.foo" -> "value2"
))
val record = collection.find(query)....

kafka-apache flink execution log4j error

I'm trying to run a simple Apache Flink script with Kafka inegration but I keep on having problems with the execution.
The script should read messages coming from a kafka producer, elaborate them, and then send back again, to an other topic, the result of the processing.
I've get this example from here:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Simple-Flink-Kafka-Test-td4828.html
The error I have is:
Exception in thread "main" java.lang.NoSuchFieldError:ALL
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenera‌tor.createJobGraph(S‌​treamingJobGraphGene‌​rator.java:86)
at org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph‌​(StreamGraph.java:42‌​9)
at org.apache.flink.streaming.api.environment.LocalStreamEnviro‌nment.execute(LocalS‌​treamEnvironment.jav‌​a:46)
at org.apache.flink.streaming.api.environment.LocalStreamEnviro‌nment.execute(LocalS‌​treamEnvironment.jav‌​a:33)
This is my code:
public class App {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "localhost:9092");
//properties.setProperty("zookeeper.connect", "localhost:2181");
properties.setProperty("group.id", "javaflink");
DataStream<String> messageStream = env.addSource(new FlinkKafkaConsumer010<String>("test", new SimpleStringSchema(), properties));
System.out.println("Step D");
messageStream.map(new MapFunction<String, String>(){
public String map(String value) throws Exception {
// TODO Auto-generated method stub
return "Blablabla " + value;
}
}).addSink(new FlinkKafkaProducer010("localhost:9092", "demo2", new SimpleStringSchema()));
env.execute();
}
}
These are the pom.xml dependencies:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java_2.11</artifactId>
<version>0.10.2</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-core</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.3.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.3.1</version>
</dependency>
What could cause this kind of error?
Thanks
Luca
The problem is most likely caused by the mixture of different Flink versions you have defined in your pom.xml. In order to run this program, it should be enough to include the following dependencies:
<!-- Streaming API -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_2.11</artifactId>
<version>1.3.1</version>
</dependency>
<!-- In order to execute the program from within your IDE -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.11</artifactId>
<version>1.3.1</version>
</dependency>
<!-- Kafka connector dependency -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_2.11</artifactId>
<version>1.3.1</version>
</dependency>