Flink Streaming Event time Window - apache-kafka

I am running a simple example to test window based on EventTime. I am able to generate output with processing time but when i using EventTime, no output is coming . Please help me to understand what i am doing wrong.
i am creating a SlidingWindow of size 10 seconds which slides every 5 seconds and at the end of the window, the system will emit the number of messages that were received during that time.
input :
a,1513695853 (generated at 13th second, received at 13th second)
a,1513695853 (generated at 13th second, received at 13th second)
a,1513695856 (generated at 16th second, received at 19th second)
a,1513695859 (generated at 13th second, received at 19th second)
2nd field represent timestamp of event, representing 13th,13th,16th,19th second of a minute.
if i am using Processing Time window :
Output :
(a,1)
(a,3)
(a,2)
But when i am using Event Time than no output is printing. Please help me to understand what is going wrong.
package org.apache.flink.window.training;
import java.io.InputStream;
import java.util.Properties;
import org.apache.flink.api.common.functions.FoldFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.TimeCharacteristic;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks;
import org.apache.flink.streaming.api.watermark.Watermark;
import org.apache.flink.streaming.api.windowing.time.Time;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;
import com.fasterxml.jackson.databind.ObjectMapper;
public class SocketStream {
private static Properties properties = new Properties();
public static void main(String args[]) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
InputStream inputStream =
SocketStream.class.getClassLoader().getResourceAsStream("local-kafka-server.properties");
properties.load(inputStream);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
FlinkKafkaConsumer010<String> consumer =
new FlinkKafkaConsumer010<>("test-topic", new SimpleStringSchema(), properties);
DataStream<Element> socketStockStream =
env.addSource(consumer).map(new MapFunction<String, Element>() {
#Override
public Element map(String value) throws Exception {
String split[] = value.split(",");
Element element = new Element(split[0], Long.parseLong(split[1]));
return element;
}
}).assignTimestampsAndWatermarks(new TimestampExtractor());
socketStockStream.map(new MapFunction<Element, Tuple2<String, Integer>>() {
#Override
public Tuple2<String, Integer> map(Element value) throws Exception {
return new Tuple2<String, Integer>(value.getId(), 1);
}
}).keyBy(0).timeWindow(Time.seconds(10), Time.seconds(5))
.sum(1).
print();
env.execute();
}
public static class TimestampExtractor implements AssignerWithPunctuatedWatermarks<Element> {
private static final long serialVersionUID = 1L;
#Override
public long extractTimestamp(Element element, long previousElementTimestamp) {
return element.getTimestamp();
}
#Override
public Watermark checkAndGetNextWatermark(Element lastElement, long extractedTimestamp) {
// TODO Auto-generated method stub
return null;
}
}
}

Event-time processing requires properly generated timestamps and watermarks.
The TimestampExtractor in your code does not generate watermark but returns always null.

Related

Best practice for implementing Micronaut/Kafka-Streams with more than one KStream/KTable?

There are several details about the example Micronaut/Kafka Streams application which I don't understand. Here is the example class from the documentation (original link: https://micronaut-projects.github.io/micronaut-kafka/latest/guide/#kafkaStreams).
My questions are:
Why are we returning only the source stream?
If we have multiple source KStream objects, EG to do a join, do we need to also make them Beans?
Do we also need make each source KTable a Bean?
What happens if we don't make a source KStream or KTable a Bean? We currently have at least one project that does this but with no apparent problems.
import io.micronaut.configuration.kafka.streams.ConfiguredStreamBuilder;
import io.micronaut.context.annotation.Factory;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.streams.StreamsConfig;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.KTable;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.Produced;
import javax.inject.Named;
import javax.inject.Singleton;
import java.util.Arrays;
import java.util.Locale;
import java.util.Properties;
#Factory
public class WordCountStream {
public static final String STREAM_WORD_COUNT = "word-count";
public static final String INPUT = "streams-plaintext-input";
public static final String OUTPUT = "streams-wordcount-output";
public static final String WORD_COUNT_STORE = "word-count-store";
#Singleton
#Named(STREAM_WORD_COUNT)
KStream<String, String> wordCountStream(ConfiguredStreamBuilder builder) {
// set default serdes
Properties props = builder.getConfiguration();
props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
KStream<String, String> source = builder
.stream(INPUT);
KTable<String, Long> groupedByWord = source
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.groupBy((key, word) -> word, Grouped.with(Serdes.String(), Serdes.String()))
//Store the result in a store for lookup later
.count(Materialized.as(WORD_COUNT_STORE));
groupedByWord
//convert to stream
.toStream()
//send to output using specific serdes
.to(OUTPUT, Produced.with(Serdes.String(), Serdes.Long()));
return source;
}
}
Edit: Here's a version of our service with multiple streams, edited to remove identifying info.
#Factory
public class TopologyCopy {
private static class DataOut {}
private static class DataInOne {}
private static class DataInTwo {}
private static class DataInThree {}
#Singleton
#Named("data")
KStream<Integer, DataOut> dataStream(ConfiguredStreamBuilder builder) {
Properties props = builder.getConfiguration();
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(StreamsConfig.DEFAULT_DESERIALIZATION_EXCEPTION_HANDLER_CLASS_CONFIG, LogAndContinueExceptionHandler.class);
KStream<Integer, DataInOne> dataOneStream = builder.stream("data-one",
Consumed.with(TextualIntSerde.INSTANCE, new JsonSerde<>(DataInOne.class)));
KStream<Integer, DataInTwo> dataTwoStream = builder.stream("data-two",
Consumed.with(TextualIntSerde.INSTANCE, new JsonSerde<>(DataInTwo.class)));
GlobalKTable<Integer, DataInThree> signalTable = builder.globalTable("data-three",
Consumed.with(TextualIntSerde.INSTANCE, new JsonSerde<>(DataInThree.class)),
Materialized.as("data-three-store"));
KTable<Integer, DataInTwo> dataTwoTable = dataTwoStream
.groupByKey()
.aggregate(() -> null, (key, device, storedDevice) -> device,
Materialized.with(TextualIntSerde.INSTANCE, new JsonSerde<>(DataInTwo.class)));
dataOneStream
.transformValues(() -> /* MAGIC */))
.join(dataTwoTable, (data1, data2) -> /* MAGIC */)
.selectKey((something, msg) -> /* MAGIC */)
.to("topic-out", Produced.with(Serdes.UUID(), new JsonSerde<>(OutMessage.class)));
return dataOneStream;
}
}

Writing to Google Cloud Storage from PubSub using Cloud Dataflow using DoFn

I am trying write Google PubSub messages to Google Cloud Storage using Google Cloud Dataflow. I know that TextIO/AvroIO do not support streaming pipelines. However, I read in [1] that it is possible to write to GCS in a streaming pipeline from a ParDo/DoFn in a comment by the author. I constructed a pipeline by following their article as closely as I could.
I was aiming for this behaviour:
Messages written out in a batches of up to 100 to objects in GCS (one per window pane) under a path that corresponds to the time the message was published in dataflow-requests/[isodate-time]/[paneIndex].
I get different results:
There is only a single pane in every hourly window. I therefore only get one file in every hourly 'bucket' (it's really an object path in GCS). Reducing MAX_EVENTS_IN_FILE to 10 made no difference, still only one pane/file.
There is only a single message in every GCS object that is written out
The pipeline occasionally raises a CRC error when writing to GCS.
How do I fix these problems and get the behaviour I'm expecting?
Sample log output:
21:30:06.977 writing pane 0 to blob dataflow-requests/2016-04-08T20:59:59.999Z/0
21:30:06.977 writing pane 0 to blob dataflow-requests/2016-04-08T20:59:59.999Z/0
21:30:07.773 sucessfully write pane 0 to blob dataflow-requests/2016-04-08T20:59:59.999Z/0
21:30:07.846 sucessfully write pane 0 to blob dataflow-requests/2016-04-08T20:59:59.999Z/0
21:30:07.847 writing pane 0 to blob dataflow-requests/2016-04-08T20:59:59.999Z/0
Here is my code:
package com.example.dataflow;
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.PubsubIO;
import com.google.cloud.dataflow.sdk.options.DataflowPipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.transforms.DoFn;
import com.google.cloud.dataflow.sdk.transforms.ParDo;
import com.google.cloud.dataflow.sdk.transforms.windowing.*;
import com.google.cloud.dataflow.sdk.values.PCollection;
import com.google.gcloud.storage.BlobId;
import com.google.gcloud.storage.BlobInfo;
import com.google.gcloud.storage.Storage;
import com.google.gcloud.storage.StorageOptions;
import org.joda.time.Duration;
import org.joda.time.format.ISODateTimeFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
public class PubSubGcsSSCCEPipepline {
private static final Logger LOG = LoggerFactory.getLogger(PubSubGcsSSCCEPipepline.class);
public static final String BUCKET_PATH = "dataflow-requests";
public static final String BUCKET_NAME = "myBucketName";
public static final Duration ONE_DAY = Duration.standardDays(1);
public static final Duration ONE_HOUR = Duration.standardHours(1);
public static final Duration TEN_SECONDS = Duration.standardSeconds(10);
public static final int MAX_EVENTS_IN_FILE = 100;
public static final String PUBSUB_SUBSCRIPTION = "projects/myProjectId/subscriptions/requests-dataflow";
private static class DoGCSWrite extends DoFn<String, Void>
implements DoFn.RequiresWindowAccess {
public transient Storage storage;
{ init(); }
public void init() { storage = StorageOptions.defaultInstance().service(); }
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
init();
}
#Override
public void processElement(ProcessContext c) throws Exception {
String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
String blobName = String.format("%s/%s/%s", BUCKET_PATH, isoDate, c.pane().getIndex());
BlobId blobId = BlobId.of(BUCKET_NAME, blobName);
LOG.info("writing pane {} to blob {}", c.pane().getIndex(), blobName);
storage.create(BlobInfo.builder(blobId).contentType("text/plain").build(), c.element().getBytes());
LOG.info("sucessfully write pane {} to blob {}", c.pane().getIndex(), blobName);
}
}
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
options.as(DataflowPipelineOptions.class).setStreaming(true);
Pipeline p = Pipeline.create(options);
PubsubIO.Read.Bound<String> readFromPubsub = PubsubIO.Read.named("ReadFromPubsub")
.subscription(PUBSUB_SUBSCRIPTION);
PCollection<String> streamData = p.apply(readFromPubsub);
PCollection<String> windows = streamData.apply(Window.<String>into(FixedWindows.of(ONE_HOUR))
.withAllowedLateness(ONE_DAY)
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterPane.elementCountAtLeast(MAX_EVENTS_IN_FILE))
.withLateFirings(AfterFirst.of(AfterPane.elementCountAtLeast(MAX_EVENTS_IN_FILE),
AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(TEN_SECONDS))))
.discardingFiredPanes());
windows.apply(ParDo.of(new DoGCSWrite()));
p.run();
}
}
[1] https://labs.spotify.com/2016/03/10/spotifys-event-delivery-the-road-to-the-cloud-part-iii/
Thanks to Sam McVeety for the solution. Here is the corrected code for anyone reading:
package com.example.dataflow;
import com.google.cloud.dataflow.sdk.Pipeline;
import com.google.cloud.dataflow.sdk.io.PubsubIO;
import com.google.cloud.dataflow.sdk.options.DataflowPipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptions;
import com.google.cloud.dataflow.sdk.options.PipelineOptionsFactory;
import com.google.cloud.dataflow.sdk.transforms.*;
import com.google.cloud.dataflow.sdk.transforms.windowing.*;
import com.google.cloud.dataflow.sdk.values.KV;
import com.google.cloud.dataflow.sdk.values.PCollection;
import com.google.gcloud.WriteChannel;
import com.google.gcloud.storage.BlobId;
import com.google.gcloud.storage.BlobInfo;
import com.google.gcloud.storage.Storage;
import com.google.gcloud.storage.StorageOptions;
import org.joda.time.Duration;
import org.joda.time.format.ISODateTimeFormat;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Iterator;
public class PubSubGcsSSCCEPipepline {
private static final Logger LOG = LoggerFactory.getLogger(PubSubGcsSSCCEPipepline.class);
public static final String BUCKET_PATH = "dataflow-requests";
public static final String BUCKET_NAME = "myBucketName";
public static final Duration ONE_DAY = Duration.standardDays(1);
public static final Duration ONE_HOUR = Duration.standardHours(1);
public static final Duration TEN_SECONDS = Duration.standardSeconds(10);
public static final int MAX_EVENTS_IN_FILE = 100;
public static final String PUBSUB_SUBSCRIPTION = "projects/myProjectId/subscriptions/requests-dataflow";
private static class DoGCSWrite extends DoFn<Iterable<String>, Void>
implements DoFn.RequiresWindowAccess {
public transient Storage storage;
{ init(); }
public void init() { storage = StorageOptions.defaultInstance().service(); }
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
init();
}
#Override
public void processElement(ProcessContext c) throws Exception {
String isoDate = ISODateTimeFormat.dateTime().print(c.window().maxTimestamp());
long paneIndex = c.pane().getIndex();
String blobName = String.format("%s/%s/%s", BUCKET_PATH, isoDate, paneIndex);
BlobId blobId = BlobId.of(BUCKET_NAME, blobName);
LOG.info("writing pane {} to blob {}", paneIndex, blobName);
WriteChannel writer = storage.writer(BlobInfo.builder(blobId).contentType("text/plain").build());
LOG.info("blob stream opened for pane {} to blob {} ", paneIndex, blobName);
int i=0;
for (Iterator<String> it = c.element().iterator(); it.hasNext();) {
i++;
writer.write(ByteBuffer.wrap(it.next().getBytes()));
LOG.info("wrote {} elements to blob {}", i, blobName);
}
writer.close();
LOG.info("sucessfully write pane {} to blob {}", paneIndex, blobName);
}
}
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
options.as(DataflowPipelineOptions.class).setStreaming(true);
Pipeline p = Pipeline.create(options);
PubsubIO.Read.Bound<String> readFromPubsub = PubsubIO.Read.named("ReadFromPubsub")
.subscription(PUBSUB_SUBSCRIPTION);
PCollection<String> streamData = p.apply(readFromPubsub);
PCollection<KV<String, String>> keyedStream =
streamData.apply(WithKeys.of(new SerializableFunction<String, String>() {
public String apply(String s) { return "constant"; } }));
PCollection<KV<String, Iterable<String>>> keyedWindows = keyedStream
.apply(Window.<KV<String, String>>into(FixedWindows.of(ONE_HOUR))
.withAllowedLateness(ONE_DAY)
.triggering(AfterWatermark.pastEndOfWindow()
.withEarlyFirings(AfterPane.elementCountAtLeast(MAX_EVENTS_IN_FILE))
.withLateFirings(AfterFirst.of(AfterPane.elementCountAtLeast(MAX_EVENTS_IN_FILE),
AfterProcessingTime.pastFirstElementInPane()
.plusDelayOf(TEN_SECONDS))))
.discardingFiredPanes())
.apply(GroupByKey.create());
PCollection<Iterable<String>> windows = keyedWindows
.apply(Values.<Iterable<String>>create());
windows.apply(ParDo.of(new DoGCSWrite()));
p.run();
}
}
There's a gotcha here, which is that you'll need a GroupByKey in order for the panes to be aggregated appropriate. The Spotify example references this as "Materialization of panes is done in “Aggregate Events” transform which is nothing else than a GroupByKey transform", but it's a subtle point. You'll need to provide a key in order to do this, and in your case, it appears a constant value will work.
PCollection<String> streamData = p.apply(readFromPubsub);
PCollection<KV<String, String>> keyedStream =
streamData.apply(WithKeys.of(new SerializableFunction<String, String>() {
public Integer apply(String s) { return "constant"; } }));
At this point, you can apply your windowing function, and then a final GroupByKey to get the desired behavior:
PCollection<String, Iterable<String>> keyedWindows = keyedStream.apply(...)
.apply(GroupByKey.create());
PCollection<Iterable<String>> windows = keyedWindows
.apply(Values.<Iterable<String>>create());
Now the elements in processElement will be Iterable<String>, with size 100 or more.
We've filed https://issues.apache.org/jira/browse/BEAM-184 to make this behavior clearer.
As of Beam 2.0, TextIO/AvroIO do support writing unbounded collections - see documentation, in particular, you have to specify withWindowedWrites().

Updating the color of rows of a TableView consumes too much CPU

I am making an application that receives alerts.
An alert can have 4 possible states:
Unresolved_New_0
Unresolved_New_1
Unresolved_Old
Resolved
When an alert is received, it is in Unresolved_New_0 state. For 10 seconds, every 0.5s the state changes from Unresolved_New_0 to Unresolved_New_1 and vice-versa. Depending on state I, set a different background color to the table row (so that it flashes, for 10s).
When the 10s pass, the alert transitions to Unresolved_Old state. This causes its color to stop changing.
To implement this, I have a ScheduledThreadPoolExecutor that I use to submit an implementation of Runnable that for some time executes a runnable using Platform.runLater.
static class FxTask extends Runnable {
/**
*
* #param runnableDuring Runnable to be run while the task is active (run on the JavaFX application thread).
* #param runnableAfter Runnable to be run after the given duration is elapsed (run on the JavaFX application thread).
* #param duration Duration to run this task for.
* #param unit Time unit.
*/
public static FxTask create(final Runnable runnableDuring, final Runnable runnableAfter, final long duration, final TimeUnit unit) {
return new FxTask(runnableDuring, runnableAfter, duration, unit);
}
#Override
public void run() {
if (System.nanoTime() - mTimeStarted >= mTimeUnit.toNanos(mDuration) )
{
cancel();
Platform.runLater(mRunnableAfter);
}
else
Platform.runLater(mRunnableDuring);
}
private FxTask(final Runnable during, final Runnable after, final long duration, final TimeUnit unit) {
mRunnableDuring = during;
mRunnableAfter = after;
mDuration = duration;
mTimeUnit = unit;
mTimeStarted = System.nanoTime();
}
private final Runnable mRunnableDuring;
private final Runnable mRunnableAfter;
private final long mDuration;
private final TimeUnit mTimeUnit;
private final long mTimeStarted;
}
And I schedule Alerts using that Runnable as follows:
final Alert alert = new Alert(...);
scheduler.scheduleAtFixedRate(FxTask.create(
() -> {
switch (alert.alertStateProperty().get()) {
case UNRESOLVED_NEW_0:
alert.alertStateProperty().set(Alert.State.UNRESOLVED_NEW_1);
refreshTable(mAlertsTable);
break;
case UNRESOLVED_NEW_1:
alert.alertStateProperty().set(Alert.State.UNRESOLVED_NEW_0);
refreshTable(mAlertsTable);
break;
}
},
() -> { // This is run at the end
if (equalsAny(alert.alertStateProperty().get(), Alert.State.UNRESOLVED_NEW_0, SpreadAlert.State.UNRESOLVED_NEW_1)) {
alert.alertStateProperty().set(Alert.State.UNRESOLVED_OLD);
refreshTable(mAlertsTable);
}
},
10, TimeUnit.SECONDS), 0, 500, TimeUnit.MILLISECONDS
);
Note: alertStateProperty() is not shown on the TableView (it is not bound to any of its columns).
So in order to force JavaFx to redraw, I have to use refreshTable(), which unfortunately redraws the whole table (?).
public static <T> void refreshTable(final TableView<T> table) {
table.getColumns().get(0).setVisible(false);
table.getColumns().get(0).setVisible(true);
}
The problem is that even if I create a small number of Alerts at the same time, CPU usage goes very high: from 20% to 84% sometimes, averaging at about 40%. When the 10s pass for all alerts, CPU consumptions returns to 0%. If I comment out refreshTable(), CPU stays near 0%, which indicates that it is the problem.
Why is so much CPU being used? (I have 8 cores by the way).
Is there another way to redraw just a single row without redrawing the whole table?
I even tried a 'hacky' method -- changing all values of the Alerts and then resetting them back to cause JavaFx to detect the change and redraw, but CPU was again at the same levels.
Probably the most efficient way to change the color of a table row is to use a table row factory, have the table row it creates observe the appropriate property, and update one or more CSS PseudoClass states as appropriate. Then just define the colors in an external css file.
Here's a standalone version of the application you described. I just used a Timeline to perform the "flashing new alerts", which is less code; but use the executor as you have it if you prefer. The key idea here is the table row factory, and the pseudoclass state it manipulates by observing the property. On my system, if I fill the entire table with new (flashing) rows, the CPU doesn't exceed about 35% (percentage of one core), which seems perfectly acceptable.
Note that PseudoClass was introduced in Java 8. In earlier versions of JavaFX you can achieve the same by manipulating the style classes instead, though you have to be careful not to duplicate any style classes as they are stored as a List. Anecdotally, the pseudoclass approach is more efficient.
import java.util.ArrayList;
import java.util.List;
import java.util.Random;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import javafx.animation.KeyFrame;
import javafx.animation.Timeline;
import javafx.application.Application;
import javafx.beans.binding.Bindings;
import javafx.beans.property.IntegerProperty;
import javafx.beans.property.ObjectProperty;
import javafx.beans.property.ReadOnlyObjectWrapper;
import javafx.beans.property.SimpleIntegerProperty;
import javafx.beans.property.SimpleObjectProperty;
import javafx.beans.property.SimpleStringProperty;
import javafx.beans.property.StringProperty;
import javafx.beans.value.ChangeListener;
import javafx.beans.value.ObservableValue;
import javafx.collections.ListChangeListener.Change;
import javafx.css.PseudoClass;
import javafx.geometry.Insets;
import javafx.geometry.Pos;
import javafx.scene.Node;
import javafx.scene.Scene;
import javafx.scene.control.Button;
import javafx.scene.control.ContentDisplay;
import javafx.scene.control.TableCell;
import javafx.scene.control.TableColumn;
import javafx.scene.control.TableRow;
import javafx.scene.control.TableView;
import javafx.scene.layout.BorderPane;
import javafx.scene.layout.HBox;
import javafx.scene.layout.Region;
import javafx.stage.Stage;
import javafx.util.Duration;
public class AlertTableDemo extends Application {
#Override
public void start(Stage primaryStage) {
TableView<Alert> table = new TableView<>();
table.getColumns().add(createColumn("Name", Alert::nameProperty));
table.getColumns().add(createColumn("Value", Alert::valueProperty));
TableColumn<Alert, Alert> resolveCol =
createColumn("Resolve", ReadOnlyObjectWrapper<Alert>::new);
resolveCol.setCellFactory(this::createResolveCell);
table.getColumns().add(resolveCol);
// just need a wrapper really, don't need the atomicity...
AtomicInteger alertCount = new AtomicInteger();
Random rng = new Random();
Button newAlertButton = new Button("New Alert");
newAlertButton.setOnAction( event ->
table.getItems().add(new Alert("Alert "+alertCount.incrementAndGet(),
rng.nextInt(20)+1)));
// set psuedo-classes on table rows depending on alert state:
table.setRowFactory(tView -> {
TableRow<Alert> row = new TableRow<>();
ChangeListener<Alert.State> listener = (obs, oldState, newState) ->
updateTableRowPseudoClassState(row, row.getItem().getState());
row.itemProperty().addListener((obs, oldAlert, newAlert) -> {
if (oldAlert != null) {
oldAlert.stateProperty().removeListener(listener);
}
if (newAlert == null) {
clearTableRowPseudoClassState(row);
} else {
updateTableRowPseudoClassState(row, row.getItem().getState());
newAlert.stateProperty().addListener(listener);
}
});
return row ;
});
// flash new alerts:
table.getItems().addListener((Change<? extends Alert> change) -> {
while (change.next()) {
if (change.wasAdded()) {
List<? extends Alert> newAlerts =
new ArrayList<>(change.getAddedSubList());
flashAlerts(newAlerts);
}
}
});
HBox controls = new HBox(5, newAlertButton);
controls.setPadding(new Insets(10));
controls.setAlignment(Pos.CENTER);
BorderPane root = new BorderPane(table, null, null, controls, null);
Scene scene = new Scene(root, 800, 600);
scene.getStylesheets().add(
getClass().getResource("alert-table.css").toExternalForm());
primaryStage.setScene(scene);
primaryStage.show();
}
private void flashAlerts(List<? extends Alert> newAlerts) {
Timeline timeline = new Timeline(new KeyFrame(Duration.seconds(0.5),
event -> {
for (Alert newAlert : newAlerts) {
if (newAlert.getState()==Alert.State.UNRESOLVED_NEW_0) {
newAlert.setState(Alert.State.UNRESOLVED_NEW_1);
} else if (newAlert.getState() == Alert.State.UNRESOLVED_NEW_1){
newAlert.setState(Alert.State.UNRESOLVED_NEW_0);
}
}
}));
timeline.setOnFinished(event -> {
for (Alert newAlert : newAlerts) {
if (newAlert.getState() != Alert.State.RESOLVED) {
newAlert.setState(Alert.State.UNRESOLVED_OLD);
}
}
});
timeline.setCycleCount(20);
timeline.play();
}
private void clearTableRowPseudoClassState(Node node) {
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-new"), false);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-new-alt"), false);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-old"), false);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("resolved"), false);
}
private void updateTableRowPseudoClassState(Node node, Alert.State state) {
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-new"),
state==Alert.State.UNRESOLVED_NEW_0);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-new-alt"),
state==Alert.State.UNRESOLVED_NEW_1);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("unresolved-old"),
state==Alert.State.UNRESOLVED_OLD);
node.pseudoClassStateChanged(PseudoClass.getPseudoClass("resolved"),
state==Alert.State.RESOLVED);
}
private TableCell<Alert, Alert> createResolveCell(TableColumn<Alert, Alert> col) {
TableCell<Alert, Alert> cell = new TableCell<>();
Button resolveButton = new Button("Resolve");
resolveButton.setOnAction(event ->
cell.getItem().setState(Alert.State.RESOLVED));
cell.setContentDisplay(ContentDisplay.GRAPHIC_ONLY);
cell.setAlignment(Pos.CENTER);
cell.graphicProperty().bind(
Bindings.when(cell.emptyProperty())
.then((Node)null)
.otherwise(resolveButton));
return cell ;
}
private <S, T> TableColumn<S, T> createColumn(String title,
Function<S, ObservableValue<T>> propertyMapper) {
TableColumn<S,T> col = new TableColumn<>(title);
col.setCellValueFactory(cellData -> propertyMapper.apply(cellData.getValue()));
col.setMinWidth(Region.USE_PREF_SIZE);
col.setPrefWidth(150);
return col ;
}
public static class Alert {
public enum State {
UNRESOLVED_NEW_0, UNRESOLVED_NEW_1, UNRESOLVED_OLD, RESOLVED
}
private final ObjectProperty<State> state = new SimpleObjectProperty<>();
private final StringProperty name = new SimpleStringProperty();
private final IntegerProperty value = new SimpleIntegerProperty();
public final ObjectProperty<State> stateProperty() {
return this.state;
}
public final AlertTableDemo.Alert.State getState() {
return this.stateProperty().get();
}
public final void setState(final AlertTableDemo.Alert.State state) {
this.stateProperty().set(state);
}
public final StringProperty nameProperty() {
return this.name;
}
public final java.lang.String getName() {
return this.nameProperty().get();
}
public final void setName(final java.lang.String name) {
this.nameProperty().set(name);
}
public final IntegerProperty valueProperty() {
return this.value;
}
public final int getValue() {
return this.valueProperty().get();
}
public final void setValue(final int value) {
this.valueProperty().set(value);
}
public Alert(String name, int value) {
setName(name);
setValue(value);
setState(State.UNRESOLVED_NEW_0);
}
}
public static void main(String[] args) {
launch(args);
}
}
alert-table.css:
.table-row-cell:resolved {
-fx-background: green ;
}
.table-row-cell:unresolved-old {
-fx-background: red ;
}
.table-row-cell:unresolved-new {
-fx-background: blue ;
}
.table-row-cell:unresolved-new-alt {
-fx-background: yellow ;
}

Create Scalding Source like TextLine that combines multiple files into single mappers

We have many small files that need combining. In Scalding you can use TextLine to read files as text lines. The problem is we get 1 mapper per file, but we want to combine multiple files so that they are processed by 1 mapper.
I understand we need to change the input format to an implementation of CombineFileInputFormat, and this may involve using cascadings CombinedHfs. We cannot work out how to do this, but it should be just a handful of lines of code to define our own Scalding source called, say, CombineTextLine.
Many thanks to anyone who can provide the code to do this.
As a side question, we have some data that is in s3, it would be great if the solution given works for s3 files - I guess it depends on whether CombineFileInputFormat or CombinedHfs works for s3.
You get the idea in your question, so here is what possibly is a solution for you.
Create your own input format that extends the CombineFileInputFormat and uses your own custom RecordReader. I am showing you Java code, but you could easily convert it to scala if you want.
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.LineRecordReader;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
import org.apache.hadoop.mapred.lib.CombineFileRecordReader;
import org.apache.hadoop.mapred.lib.CombineFileSplit;
public class CombinedInputFormat<K, V> extends CombineFileInputFormat<K, V> {
public static class MyKeyValueLineRecordReader implements RecordReader<LongWritable,Text> {
private final RecordReader<LongWritable,Text> delegate;
public MyKeyValueLineRecordReader(CombineFileSplit split, Configuration conf, Reporter reporter, Integer idx) throws IOException {
FileSplit fileSplit = new FileSplit(split.getPath(idx), split.getOffset(idx), split.getLength(idx), split.getLocations());
delegate = new LineRecordReader(conf, fileSplit);
}
#Override
public boolean next(LongWritable key, Text value) throws IOException {
return delegate.next(key, value);
}
#Override
public LongWritable createKey() {
return delegate.createKey();
}
#Override
public Text createValue() {
return delegate.createValue();
}
#Override
public long getPos() throws IOException {
return delegate.getPos();
}
#Override
public void close() throws IOException {
delegate.close();
}
#Override
public float getProgress() throws IOException {
return delegate.getProgress();
}
}
#Override
public RecordReader getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException {
return new CombineFileRecordReader(job, (CombineFileSplit) split, reporter, (Class) MyKeyValueLineRecordReader.class);
}
}
Then you need to extend the TextLine class and make it use your own input format you just defined (Scala code from now on).
import cascading.scheme.hadoop.TextLine
import cascading.flow.FlowProcess
import org.apache.hadoop.mapred.{OutputCollector, RecordReader, JobConf}
import cascading.tap.Tap
import com.twitter.scalding.{FixedPathSource, TextLineScheme}
import cascading.scheme.Scheme
class CombineFileTextLine extends TextLine{
override def sourceConfInit(flowProcess: FlowProcess[JobConf], tap: Tap[JobConf, RecordReader[_, _], OutputCollector[_, _]], conf: JobConf) {
super.sourceConfInit(flowProcess, tap, conf)
conf.setInputFormat(classOf[CombinedInputFormat[String, String]])
}
}
Create a scheme for the for your combined input.
trait CombineFileTextLineScheme extends TextLineScheme{
override def hdfsScheme = new CombineFileTextLine().asInstanceOf[Scheme[JobConf,RecordReader[_,_],OutputCollector[_,_],_,_]]
}
Finally, create your source class:
case class CombineFileMultipleTextLine(p : String*) extends FixedPathSource(p :_*) with CombineFileTextLineScheme
If you want to use a single path instead of multiple ones, the change to your source class is trivial.
I hope that helps.
this should do the trick, ya man? - https://wiki.apache.org/hadoop/HowManyMapsAndReduces

Reading xls file in gwt

I am looking to read xls file using the gwt RPC and when I am using the code which excecuted fine in normal file it is unable to load the file and giving me null pointer exception.
Following is the code
{
{
import com.arosys.readExcel.ReadXLSX;
import com.google.gwt.user.server.rpc.RemoteServiceServlet;
import org.Preview.client.GWTReadXL;
import java.io.InputStream;
import com.arosys.customexception.FileNotFoundException;
import com.arosys.logger.LoggerFactory;
import java.util.Iterator;
import org.apache.log4j.Logger;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
/**
*
* #author Amandeep
*/
public class GWTReadXLImpl extends RemoteServiceServlet implements GWTReadXL
{
private String fileName;
private String[] Header=null;
private String[] RowData=null;
private int sheetindex;
private String sheetname;
private XSSFWorkbook workbook;
private XSSFSheet sheet;
private static Logger logger=null;
public void loadXlsxFile() throws Exception
{
logger.info("inside loadxlsxfile:::"+fileName);
InputStream resourceAsStream =ClassLoader.getSystemClassLoader().getSystemResourceAsStream("c:\\test2.xlsx");
logger.info("resourceAsStream-"+resourceAsStream);
if(resourceAsStream==null)
throw new FileNotFoundException("unable to locate give file");
else
{
try
{
workbook = new XSSFWorkbook(resourceAsStream);
sheet = workbook.getSheetAt(sheetindex);
}
catch (Exception ex)
{
logger.error(ex.getMessage());
}
}
}// end loadxlsxFile
public String getNumberOfColumns() throws Exception
{
int NO_OF_Column=0; XSSFCell cell = null;
loadXlsxFile();
Iterator rowIter = sheet.rowIterator();
XSSFRow firstRow = (XSSFRow) rowIter.next();
Iterator cellIter = firstRow.cellIterator();
while(cellIter.hasNext())
{
cell = (XSSFCell) cellIter.next();
NO_OF_Column++;
}
return NO_OF_Column+"";
}
}
}
I am calling it in client program by this code:
final AsyncCallback<String> callback1 = new AsyncCallback<String>() {
public void onSuccess(String result) {
RootPanel.get().add(new Label("In success"));
if(result==null)
{
RootPanel.get().add(new Label("result is null"));
}
RootPanel.get().add(new Label("result is"+result));
}
public void onFailure(Throwable caught) {
RootPanel.get().add(new Label("In Failure"+caught));
}
};
try{
getService().getNumberOfColumns(callback1);
}catch(Exception e){}
}
Pls tell me how can I resolve this issue as the code runs fine when run through the normal java file.
Why are using using the system classloader, rather than the normal one?
But, If you still want to use then look at this..
As you are using like a web application. In that case, you need to use the ClassLoader which is obtained as follows:
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
This one has access to the all classpath paths tied to the webapplication in question and you're not anymore dependent on which parent classloader (a webapp has more than one!) has loaded your class.
Then, on this classloader, you need to just call getResourceAsStream() to get a classpath resource as stream, not the getSystemResourceAsStream() which is dependent on how the webapplication is started. You don't want to be dependent on that as well since you have no control over it at external hosting:
InputStream input = classLoader.getResourceAsStream("filename.extension");
The location of file should in your CLASSPATH.