I use the storm V1.2.1. After setting up according to the official documentation, I want to get some metrics in the spout, the spout code is as follows, but there is no expected metric data in graphite-web.
Question 1:How to use the New Metrics Reporting API correctly?
Question 2:How do I get the ACK number metric in the storm-bound KafkaSpout by using Storm's Old or New Metrics API?
Using New API in the spout to get the number of the tuple:
public static class MyTestWordSpout extends BaseRichSpout {
public static Logger LOG = LoggerFactory.getLogger(TestWordSpout.class);
boolean _isDistributed;
SpoutOutputCollector _collector;
private Counter tupleCounter;
transient CountMetric ackcountMetric;
long msid=0;
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
this.tupleCounter = context.registerCounter("tupleCount");
ackcountMetric = new CountMetric();
context.registerMetric("ack_count", ackcountMetric, 5);
}
public void close() {
}
public void nextTuple() {
Utils.sleep(100);
final String[] words = new String[] {"nathan", "mike", "jackson", "golda", "bertels"};
final Random rand = new Random();
final String word = words[rand.nextInt(words.length)];
_collector.emit(new Values(word),msid++);
this.tupleCounter.inc();
}
public void ack(Object msgId) {
ackcountMetric.incr();
}
public void fail(Object msgId) {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
storm.yaml:
storm.metrics.reporters:
# Graphite Reporter
- class: "org.apache.storm.metrics2.reporters.GraphiteStormReporter"
daemons:
- "supervisor"
- "nimbus"
- "worker"
report.period: 1
report.period.units: "SECONDS"
graphite.host: "10.11.6.79"
graphite.port: 2003
- class: "org.apache.storm.metrics2.reporters.ConsoleStormReporter"
daemons:
- "worker"
report.period: 1
report.period.units: "SECONDS"
graphite browser:
graphite browser
you can use this library https://github.com/staslev/storm-metrics-reporter. Add this to your pom.xml
<dependency>
<groupId>com.github.staslev</groupId>
<artifactId>storm-metrics-reporter</artifactId>
<version>1.5.0</version>
</dependency>
Add this configuration to your topology:
config.put(YammerFacadeMetric.FACADE_METRIC_TIME_BUCKET_IN_SEC, 30);
config.put(SimpleGraphiteStormMetricProcessor.GRAPHITE_HOST, "127.0.0.1");
config.put(SimpleGraphiteStormMetricProcessor.GRAPHITE_PORT, 2003);
config.put(SimpleGraphiteStormMetricProcessor.REPORT_PERIOD_IN_SEC, 10);
config.put(Config.TOPOLOGY_NAME, YOUR-TOPOLOGY.class.getCanonicalName());
config.registerMetricsConsumer(MetricReporter.class,
new MetricReporterConfig(".*", SimpleGraphiteStormMetricProcessor.class.getCanonicalName()), 1);
And add the following call into the prepare method of your bolts:
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
StormYammerMetricsAdapter.configure(stormConf, context, new MetricsRegistry());
Then you can see at your browser if Graphite shows the metrics.
Related
I'm trying to make an event processing stream using apache beam.
Steps which happen in my stream:
Read from kafka topics in avro format & deserialize avro using schema registry
Create Fixed Size window (1 hour) with triggering every 10 min (processing time)
Write avro files in GCP dividing directories by topic name. (filename = schema + start-end-window-pane)
Now let's deep into code.
This code shows how I read from Kafka. I use custom deserializer and coder to deserialize properly using schema registry (in my case it's hortonworks).
KafkaIO.<String, AvroGenericRecord>read()
.withBootstrapServers(bootstrapServers)
.withConsumerConfigUpdates(configUpdates)
.withTopics(inputTopics)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializerAndCoder(BeamKafkaAvroGenericDeserializer.class, AvroGenericCoder.of(serDeConfig()))
.commitOffsetsInFinalize()
.withoutMetadata();
In pipeline after reading records by KafkaIO is creating windowing.
records.apply(Window.<AvroGenericRecord>into(FixedWindows.of(Duration.standardHours(1)))
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterProcessingTime.pastFirstElementInPane().plusDelayOf(Duration.standardMinutes(10)))
.withLateFirings(AfterPane.elementCountAtLeast(1))
)
.withAllowedLateness(Duration.standardMinutes(5))
.discardingFiredPanes()
)
What I want to achieve by this window is to group data by event time every 1 hour and trigger every 10 min.
After grouping by a window it starts writing into Google Cloud Storage (GCS).
public class WriteAvroFilesTr extends PTransform<PCollection<AvroGenericRecord>, WriteFilesResult<AvroDestination>> {
private String baseDir;
private int numberOfShards;
public WriteAvroFilesTr(String baseDir, int numberOfShards) {
this.baseDir = baseDir;
this.numberOfShards = numberOfShards;
}
#Override
public WriteFilesResult<AvroDestination> expand(PCollection<AvroGenericRecord> input) {
ResourceId tempDir = getTempDir(baseDir);
return input.apply(AvroIO.<AvroGenericRecord>writeCustomTypeToGenericRecords()
.withTempDirectory(tempDir)
.withWindowedWrites()
.withNumShards(numberOfShards)
.to(new DynamicAvroGenericRecordDestinations(baseDir, Constants.FILE_EXTENSION))
);
}
private ResourceId getTempDir(String baseDir) {
return FileSystems.matchNewResource(baseDir + "/temp", true);
}
}
And
public class DynamicAvroGenericRecordDestinations extends DynamicAvroDestinations<AvroGenericRecord, AvroDestination, GenericRecord> {
private static final DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd HH:mm:ss");
private final String baseDir;
private final String fileExtension;
public DynamicAvroGenericRecordDestinations(String baseDir, String fileExtension) {
this.baseDir = baseDir;
this.fileExtension = fileExtension;
}
#Override
public Schema getSchema(AvroDestination destination) {
return new Schema.Parser().parse(destination.jsonSchema);
}
#Override
public GenericRecord formatRecord(AvroGenericRecord record) {
return record.getRecord();
}
#Override
public AvroDestination getDestination(AvroGenericRecord record) {
Schema schema = record.getRecord().getSchema();
return AvroDestination.of(record.getName(), record.getDate(), record.getVersionId(), schema.toString());
}
#Override
public AvroDestination getDefaultDestination() {
return new AvroDestination();
}
#Override
public FileBasedSink.FilenamePolicy getFilenamePolicy(AvroDestination destination) {
String pathStr = baseDir + "/" + destination.name + "/" + destination.date + "/" + destination.name;
return new WindowedFilenamePolicy(FileBasedSink.convertToFileResourceIfPossible(pathStr), destination.version, fileExtension);
}
private static class WindowedFilenamePolicy extends FileBasedSink.FilenamePolicy {
final ResourceId outputFilePrefix;
final String fileExtension;
final Integer version;
WindowedFilenamePolicy(ResourceId outputFilePrefix, Integer version, String fileExtension) {
this.outputFilePrefix = outputFilePrefix;
this.version = version;
this.fileExtension = fileExtension;
}
#Override
public ResourceId windowedFilename(
int shardNumber,
int numShards,
BoundedWindow window,
PaneInfo paneInfo,
FileBasedSink.OutputFileHints outputFileHints) {
IntervalWindow intervalWindow = (IntervalWindow) window;
String filenamePrefix =
outputFilePrefix.isDirectory() ? "" : firstNonNull(outputFilePrefix.getFilename(), "");
String filename =
String.format("%s-%s(%s-%s)-(%s-of-%s)%s", filenamePrefix,
version,
formatter.print(intervalWindow.start()),
formatter.print(intervalWindow.end()),
shardNumber,
numShards - 1,
fileExtension);
ResourceId result = outputFilePrefix.getCurrentDirectory();
return result.resolve(filename, RESOLVE_FILE);
}
#Override
public ResourceId unwindowedFilename(
int shardNumber, int numShards, FileBasedSink.OutputFileHints outputFileHints) {
throw new UnsupportedOperationException("Expecting windowed outputs only");
}
#Override
public void populateDisplayData(DisplayData.Builder builder) {
builder.add(
DisplayData.item("fileNamePrefix", outputFilePrefix.toString())
.withLabel("File Name Prefix"));
}
}
}
I've written down the whole of my pipeline. It kind of works well but I have misunderstood (not sure) that I handle events by event time.
Could someone review my code (especially 1 & 2 steps where I read and group by windows) either it windows by event time or not?
P.S. For every record in Kafka I have timestamp field inside.
UPD
Thanks jjayadeep
I include in KafkaIO custom TimestampPolicy
static class CustomTimestampPolicy extends TimestampPolicy<String, AvroGenericRecord> {
protected Instant currentWatermark;
CustomTimestampPolicy(Optional<Instant> previousWatermark) {
this.currentWatermark = previousWatermark.orElse(BoundedWindow.TIMESTAMP_MIN_VALUE);
}
#Override
public Instant getTimestampForRecord(PartitionContext ctx, KafkaRecord<String, AvroGenericRecord> record) {
currentWatermark = Instant.ofEpochMilli(record.getKV().getValue().getTimestamp());
return currentWatermark;
}
#Override
public Instant getWatermark(PartitionContext ctx) {
return currentWatermark;
}
}
From the documentation present here [1] event time is used as the processing time by default in KafkaIO
By default, record timestamp (event time) is set to processing time in KafkaIO reader and source watermark is current wall time. If a topic has Kafka server-side ingestion timestamp enabled ('LogAppendTime'), it can enabled with KafkaIO.Read.withLogAppendTime(). A custom timestamp policy can be provided by implementing TimestampPolicyFactory. See KafkaIO.Read.withTimestampPolicyFactory(TimestampPolicyFactory) for more information.
Also processing time is the default timestamp method used as documented below
// set event times and watermark based on LogAppendTime. To provide a custom
// policy see withTimestampPolicyFactory(). withProcessingTime() is the default.
1 - https://beam.apache.org/releases/javadoc/2.4.0/org/apache/beam/sdk/io/kafka/KafkaIO.html
I have the following verticle for testing purposes:
public class UserVerticle extends AbstractVerticle {
private static final Logger log = LoggerFactory.getLogger(UserVerticle.class);
#Override
public void start(Future<Void> sf) {
log.info("start()");
JsonObject cnf = config();
log.info("start.config={}", cnf.toString());
sf.complete();
}
#Override
public void stop(Future<Void> sf) {
log.info("stop()");
sf.complete();
}
private void onMessage(Message<JsonObject> message) { ... }
log.info("onMessage(message={})", message);
}
}
Is is deployed from the main verticle with
vertx.deployVerticle("org.buguigny.cluster.UserVerticle",
new DeploymentOptions()
.setInstances(1)
.setConfig(new JsonObject()
.put(some_key, some_data)
),
ar -> {
if(ar.succeeded()) {
log.info("UserVerticle(uname={}, addr={}) deployed", uname, addr);
// continue when OK
}
else {
log.error("Could not deploy UserVerticle(uname={}). Cause: {}", uname, ar.cause());
// continue when KO
}
});
This code works fine.
I had a look at the Verticle documentation and discovered an init() callback method I didn't see before. As the documentation doesn't say much about what it really does, I defined it to see where in the life cycle of a verticle it gets called.
#Override
public void init(Vertx vertx, Context context) {
log.info("init()");
JsonObject cnf = context.config();
log.info("init.config={}", cnf.toString());
}
However, when init() is defined I get a java.lang.NullPointerException on the line where I call JsonObject cnf = config(); in start():
java.lang.NullPointerException: null
at io.vertx.core.AbstractVerticle.config(AbstractVerticle.java:85)
at org.buguigny.cluster.UserVerticle.start(UserVerticle.java:30)
at io.vertx.core.impl.DeploymentManager.lambda$doDeploy$8(DeploymentManager.java:494)
at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:320)
at io.vertx.core.impl.EventLoopContext.lambda$executeAsync$0(EventLoopContext.java:38)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:462)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
My questions are:
Q1 : any clue why NullPointerException is thrown?
Q2 : what is the purpose of init()? Is it internal to Vertx or can it be be implemented by client code to, for example, define some fields in the verticle objects passed in deployment config ?
The init method is for internal usage and documented as such in the Javadoc. Here's the source code:
/**
* Initialise the verticle.<p>
* This is called by Vert.x when the verticle instance is deployed. Don't call it yourself.
* #param vertx the deploying Vert.x instance
* #param context the context of the verticle
*/
#Override
public void init(Vertx vertx, Context context) {
this.vertx = vertx;
this.context = context;
}
If init is documented in any user documentation it's a mistake, please report it.
I am running multiple individual kafka streams. To this end I created a Stream Manager that holds these streams. Below is the SM class in essence.
public class StreamManager {
public Map<String, BaseStream> streamMap = new HashMap<String, BaseStream>();
public Map<String, ReadOnlyKeyValueStore<Long, BaseModel>> storeMap = new HashMap<String, ReadOnlyKeyValueStore<Long, BaseModel>>();
public StreamManager(String bootstrapServer){
initialize(bootstrapServer);
}
/**
* initialize streams. Right now hard coding creation of streams here.
* #param bootstrapServer
*/
public void initialize(String bootstrapServer){
Properties s1Props = this.GetStreamingProperties(bootstrapServer, "STREAM_1");
Properties s2Props = this.GetStreamingProperties(bootstrapServer, "STREAM_1");
BaseStream s1Stream = new CompositeInfoStream(new KStreamBuilder(), compositeInfoProps);
BaseStream s2Stream = new ImcInfoStream(new KStreamBuilder(), imcInfoProps);
streamMap.put(s1Stream.storeName, s1Stream);
streamMap.put(s2Stream.storeName, s2Stream);
/**
* Start all streams
*/
public void startStreams(){
for(BaseStream stream: streamMap.values()){
stream.start();
}
}
public static void main(String[] args) throws Exception {
StreamManager mgr = new StreamManager(StreamingConfig.instance().MESSAGE_SERVER);
StreamManager.startStreamServices(mgr);
Runtime.getRuntime().addShutdownHook(new Thread() {
#Override
public void run() {
try {
} finally {
mgr.closeStreams();
}
}
});
int i = 0;
while(true) {
if (i++ == 0)
mgr.logAllStreamStates();
Thread.sleep(60000);
if (i == 60) i = 0;
}
}
}
I initialize and start the streams and then let the process run in a loop. Now what I want is to have more control over individual streams. To start and kill them if need be (for some odd reason my streams go in REBALANCE mode often and don't come back. Currently, if one of the stream goes into REBALANCING I have to kill the entire SM (all streams) and restart. What I would like to do is only restart the individual stream.
I would like to get a sense of how my architecture should be. Does kafka streams provide a mechanism to manage a cluster of streams? Can I use multiprocessing to accomplish this and if so could you guide me to some resources to do so keeping in mind that we use windows for development and linux for deployment
I need help with understanding of the win_ext window in Esper (CEP). I'm wondering why older (first 2) events still popup on the update-method even though they have been "expired"
public class MyCepTest {
public static void main(String...args) throws Exception{
System.out.println("starting");
MyCepTest ceptest = new MyCepTest();
ceptest.execute();
System.out.println("end");
}
public void execute() throws Exception{
Configuration config = new Configuration();
config.addEventType(MyPojo.class);
EPServiceProvider epService = EPServiceProviderManager.getDefaultProvider(config);
EPAdministrator admin = epService.getEPAdministrator();
EPStatement x1 = admin.createEPL(win);
EPStatement x2 = admin.createEPL(win2);
x1.setSubscriber(this);
x2.setSubscriber(this);
EPRuntime runtime = epService.getEPRuntime();
ArrayList<MyPojo> staffToSendToCep = new ArrayList<MyPojo>();
staffToSendToCep.add(new MyPojo(1, new Date(1490615719497L)));
staffToSendToCep.add(new MyPojo(2, new Date(1490615929497L)));
for(MyPojo pojo : staffToSendToCep){
runtime.sendEvent(pojo);
}
Thread.sleep(500);
System.out.println("round 2...");//why two first Pojos are still found? Shouldn't ext_timed(pojoTime.time, 300 seconds) rule them out?
staffToSendToCep.add(new MyPojo(3, new Date(1490616949497L)));
for(MyPojo pojo : staffToSendToCep){
runtime.sendEvent(pojo);
}
}
public void update(Map<String,Object> map){
System.out.println(map);
}
public static String win = "create window fiveMinuteStuff.win:ext_timed(pojoTime.time, 300 seconds)(pojoId int, pojoTime java.util.Date)";
public static String win2 = "insert into fiveMinuteStuff select pojoId,pojoTime from MyPojo";
}
class MyPojo{
int pojoId;
Date pojoTime;
MyPojo(int pojoId, Date date){
this.pojoId = pojoId;
this.pojoTime = date;
}
public int getPojoId(){
return pojoId;
}
public Date getPojoTime(){
return pojoTime;
}
public String toString(){
return pojoId+"#"+pojoTime;
}
}
I've been puzzled with this for a while and help would be greatly appreciated
See the processing model in docs. http://espertech.com/esper/release-6.0.1/esper-reference/html/processingmodel.html
All incoming insert-stream events are delivered to listeners and subscribers. regardless of your window. A window, if one is in the query at all, defines the subsets of events to consider and therefore defines what gets aggregated, pattern-matched or is available for iteration. Try "select * from MyPojo" for reference. My advice to read up on external time, see http://espertech.com/esper/release-6.0.1/esper-reference/html/api.html#api-controlling-time
Usually when you want "external time window" you want event time to drive engine time.
I have created GWT app, in which I have a Vertical Panel where I log the details.
Client side logging I'm doing using logger
sample code is:
public static VerticalPanel customLogArea = new VerticalPanel();
public static Logger rootLogger = Logger.getLogger("");
logerPanel.setTitle("Log");
scrollPanel.add(customLogArea);
logerPanel.add(scrollPanel);
if (LogConfiguration.loggingIsEnabled()) {
rootLogger.addHandler(new HasWidgetsLogHandler(customLogArea));
}
And I'm updating my vertical log panel using this code
rootLogger.log(Level.INFO,
"Already Present in Process Workspace\n");
But now my question is , I have to log server side details also into my vertical log panel.
My serverside GreetingServiceImpl code is:
public boolean createDirectory(String fileName)
throws IllegalArgumentException {
Boolean result = false;
try {
rootLogger.log(Level.INFO,
"I want to log this to my UI vertical log Panel");
system.out.println("log this to UI");
File dir = new File("D:/GenomeSamples/" + fileName);
if (!dir.exists()) {
result = dir.mkdir();
}
} catch (Exception e) {
e.printStackTrace();
}
return result;
}
Now I want to log sysoutprt statements to my UI from here. How can I achieve this. Now using rootLogger.log(Level.INFO,
"I want to log this to my UI vertical log Panel"); code it is logging this to eclipse console . But how to log this to my UI in client side.
Please let me know If anything wrong in this question.
If I understood you right, you want to see your server log entries in web interface. And of course, java logger and printStackTrace() won't help you in that: your gwt code is compiled to JavaScript and has nothing to do with console and log files. Besides, your server can't "push" log entries to client - it's up to client to make requests. So if you want to track new log entries and move it to client, you need to poll server for new entries. And yet another problem: you may have many clients polling your servlet and you should keep in mind this multi-threading.
This is how I see probable implementation (it's just concept, may contain some errors and misspellings):
Remote interface:
public interface GreetingService extends RemoteService {
List<String> getLogEntries();
boolean createDirectory(String fileName)throws IllegalArgumentException;
}
Remote Servlet:
public class GreetingServiceImpl extends RemoteServiceServlet implements GreetingService {
public static final String LOG_ENTRIES = "LogEntries";
public List<String> getLogEntries() {
List<String> entries = getEntriesFromSession();
List<String>copy = new ArrayList<String>(entries.size());
copy.addAll(entries);
//prevent loading the same entries twice
entries.clear();
return copy;
}
public boolean createDirectory(String fileName)throws IllegalArgumentException {
Boolean result = false;
try {
log("I want to log this to my UI vertical log Panel");
log("log this to UI");
File dir = new File("D:/GenomeSamples/" + fileName);
if (!dir.exists()) {
result = dir.mkdir();
}
} catch (Exception e) {
log("Exception occurred: " + e.getMessage());
}
return result;
}
private List<String> getEntriesFromSession() {
HttpSession session= getThreadLocalRequest().getSession();
List<String>entries = (List<String>)session.getAttribute(LOG_ENTRIES);
if (entries == null) {
entries = new ArrayList<String>();
session.setAttribute(LOG_ENTRIES,entries);
}
return entries;
}
private void log(String message) {
getEntriesFromSession().add(message);
}
Simple implementation of polling (gwt client-side):
Timer t = new Timer() {
#Override
public void run() {
greetingAsyncService.getLogEntries(new AsyncCallBack<List<String>>() {
void onSuccess(List<String>entries) {
//put entries to your vertical panel
}
void onFailure(Throwable caught){
//handle exceptions
}
});
}
};
// Schedule the timer to run once in second.
t.scheduleRepeating(1000);
greetingAsyncService.createDirectory(fileName, new AsyncCallBack<Void>(){
void onSuccess(List<String>entries) {
//no need to poll anymore
t.cancel();
}
void onFailure(Throwable caught){
//handle exceptions
}
});
}
As you can see, I have used session to keep log entries, because session is client-specific and so different clients will receive different logs. It's up to you to decide what to use - you may create your own Logger class that will track users itself and give appropriate logs to appropriate clients.
And also you may want to save level of your messages (INFO,ERROR etc.) and then display messages in different colors (red for ERROR, for instance). To do so, you need to save not List, but some your custom class.
You'd create a logging servlet that has the same methods as your logging framework to send log messages to your server via RPC.
Here are some sample RPC log methods you can use:
public interface LogService extends RemoteService {
public void logException(String logger, String priority, String message, String error, StackTraceElement[] stackTrace, String nativeStack);
}
public interface LogServiceAsync {
public void logException(String logger, String priority, String message, String error, StackTraceElement[] stackTrace, String nativeStack, AsyncCallback<Void> callback);
}
public class LogServiceImpl extends RemoteServiceServlet implements LogService {
public void logException(String loggerName, String priority, String logMessage, String errorMessage, StackTraceElement[] stackTrace, String nativeStack) {
Logger logger = getLogger(loggerName);
Level level = getLevel(priority);
// Create a Throwable to log
Throwable caught = new Throwable();
if (errorMessage != null && stackTrace != null) {
caught = new Throwable(errorMessage);
caught.setStackTrace(stackTrace);
}
//do stuff with the other passed arguments (optional)
logger.log(level, message, caught);
}
}
Although those implementations are very nice, forget about timers and repeated server queries. We've something better now.
It's possible to push data from server to client using Atmosphere which supports WebSockets.