Use kafka to detect changes on values

Use kafka to detect changes on values - apache-kafka

I have a streaming application that continuously takes in a stream of coordinates along with some custom metadata that also includes a bitstring. This stream is produced onto a kafka topic using producer API. Now another application needs to process this stream [Streams API] and store the specific bit from the bit string and generate alerts when this bit changes
Below is the continuous stream of messages that need to be processed
{"device_id":"1","status_bit":"0"}
{"device_id":"2","status_bit":"1"}
{"device_id":"1","status_bit":"0"}
{"device_id":"3","status_bit":"1"}
{"device_id":"1","status_bit":"1"} // need to generate alert with change: 0->1
{"device_id":"3","status_bits":"1"}
{"device_id":"2","status_bit":"1"}
{"device_id":"3","status_bits":"0"} // need to generate alert with change 1->0
Now I would like to write these alerts to another kafka topic like
{"device_id":1,"init":0,"final":1,"timestamp":"somets"}
{"device_id":3,"init":1,"final":0,"timestamp":"somets"}
I can save the current bit in the state store using something like
streamsBuilder
.stream("my-topic")
.mapValues((key, value) -> value.getStatusBit())
.groupByKey()
.windowedBy(TimeWindows.of(Duration.ofMinutes(1)))
.reduce((oldAggValue, newMessageValue) -> newMessageValue, Materialized.as("bit-temp-store"));
but I am unable to understand how can I detect this change from the existing bit. Do I need to query the state store somehow inside the processor topology? If yes? How? If no? What else could be done?
Any suggestions/ideas that I can try(maybe completely different from what I am thinking) are also appreciated. I am new to Kafka and thinking in terms of event driven streams is eluding me.
Thanks in advance.

I am not sure this is the best approach, but in the similar task I used an intermediate entity to capture the state change. In your case it will be something like
streamsBuilder.stream("my-topic").groupByKey()
.aggregate(DeviceState::new, new Aggregator<String, Device, DeviceState>() {
public DeviceState apply(String key, Device newValue, DeviceState state) {
if(!newValue.getStatusBit().equals(state.getStatusBit())){
state.setChanged(true);
}
state.setStatusBit(newValue.getStatusBit());
state.setDeviceId(newValue.getDeviceId());
state.setKey(key);
return state;
}
}, TimeWindows.of(…) …).filter((s, t) -> (t.changed())).toStream();
In the resulting topic you will have the changes. You can also add some attributes to DeviceState to initialise it first, depending whether you want to send the event, when the first device record arrives, etc.

Related

Late data handling | Apache Beam

Late data which has missed the window and .withAllowedLateness period is dropped off from the pipeline as documented here
I have a few questions on this behavior:
How to handle late data which is dropped off from the pipeline? Can we add default behavior? Say all late data should be logged somewhere like catch-all bucket?
Can we have a Metric(Google Dataflow Metrics/Beam) to say how many of these messages are dropped off from pipeline due to huge latency?

In general we define late data as elements that, by the time they arrive, we just prefer to drop them and do not want to process any further. As far as I know, adding extra functionality to handle those messages would require substantial effort to modify the Java SDK. However, if you just want to log them this is done by the LateDataDroppingDoFnRunner code, which is responsible for dropping data from expired windows:
for (WindowedValue<InputT> input : concatElements) {
BoundedWindow window = Iterables.getOnlyElement(input.getWindows());
if (canDropDueToExpiredWindow(window)) {
// The element is too late for this window.
droppedDueToLateness.inc();
WindowTracing.debug(
"{}: Dropping element at {} for key:{}; window:{} "
+ "since too far behind inputWatermark:{}; outputWatermark:{}",
LateDataFilter.class.getSimpleName(),
input.getTimestamp(),
key,
window,
timerInternals.currentInputWatermarkTime(),
timerInternals.currentOutputWatermarkTime());
}
}
Note that the log has DEBUG level so you might not see it. As explained here, to override the level in Dataflow, you can use --defaultWorkerLogLevel=DEBUG or, even better, specify a particular class such as --workerLogLevelOverrides={"org.apache.beam.sdk.util.WindowTracing":"DEBUG"}. Choosing your keys wisely can help expose information to identify the dropped message (i.e. data lineage).
As can be seen in the previous snippet, droppedDueToLateness is a Counter metric that is incremented each time we drop an element: droppedDueToLateness.inc();. You can monitor it using Stackdriver with resource type dataflow_job and metric custom.googleapis.com/dataflow/droppedDueToLateness.

Is it possible to change MediaRecorder's stream?

getUserMedia(constrains).then(stream => {
var recorder = new MediaRecorder(stream)
})
recorder.start()
recorder.pause()
// get new stream getUserMedia(constrains_new)
// how to update recorder stream here?
recorder.resume()
Is it possible? I've try to create MediaStream and use addTrack and removeTrack methods to change stream tracks but no success (recorder stops when I try to resume it with updated stream)
Any ideas?

The short answer is no, it's not possible. The MediaStream recording spec explicitly describes this behavior: https://w3c.github.io/mediacapture-record/#dom-mediarecorder-start. It's bullet point 15.3 of that algorithm which says "If at any point, a track is added to or removed from stream’s track set, the UA MUST immediately stop gathering data ...".
But in case you only want to record audio you can probably use an AudioContext to proxy your streams. Create a MediaStreamAudioDestinationNode and use the stream that it provides for recording. Then you can feed your streams with MediaStreamAudioSourceNodes and/or MediaStreamTrackAudioSourceNodes into the audio graph and mix them in any way you desire.
Last but not least there are currently plans to add the functionality you are looking for to the spec. Maybe you just have to wait a bit. Or maybe a bit longer depending on the browser you are using. :-)
https://github.com/w3c/mediacapture-record/issues/167
https://github.com/w3c/mediacapture-record/pull/186

apache storm missing event detection based on time

I want to detect a missing event in a data stream ( e.g. detect a customer request that has not been responded within 1 hour of its reception. )
Here, I want to detect the "Response" event and make an alert.
I tried using tick tuple by setting TOPOLOGY_TICK_TUPLE_FREQ_SECS but it is configured at a bolt level and might come after 15th minute of getting a customer request.
#Override public Map getComponentConfiguration() {
Config conf = new Config();
conf.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 1800);
return conf; }
^ this doesn't work.
Let me know in comments if any other information is required. Thanks in advance for the help.

This might help http://storm.apache.org/releases/1.0.3/Windowing.html
you can define 5 mins windows and check the status of last window events and alert based on what is received
or create an intermediate bolt which maintains these windows and sends the normal alert tuples(instead of tick tuple) in case of timeouts

Where to invoke SagaManager in CQRS even handling

Am new to Microservices and CQRS event handling. I am trying to understand with one simple task. In this task I have three REST external services to handle one transaction/request(Service). The three services are
step1: customer create.
step2: create business for customer
step3: Create Address for business.
I want to implement SAGA for these events with InMemorySagaRepository and saga manager.
Where exactly I have to initiate the SagaManager with repository, Is it in RestController or in CommandHandler ?
Can you please help me in understanding sagas flow ?
Thanks in Advance.

Half a year later, and I'm making an edit as I've now taken a course held by Greg Young called Greg Young's CQRS, Domain Events, Event Sourcing and how to apply DDD
I really recommend it to anyone thinking about CQRS. Help A LOT to understand what things actually are
Original anwser
In our product we use Sagas as something that reacts to events.
This means that our sagas are really just Subscribers to a specific Event. The saga then holds some logic as to whether it should do something or not.
If the saga finds that an action should be taken, it creates a Command which it puts on the CommandBus.
This means that Sagas are just 'reactors' and use the same path in as a user would (skipping the APIs etc).
But what a Saga really is, and what it should do, differs from the one talking about them to the other. (Disclaimer: This is how I read these posts, they might actually all say the same thing, but in a way to fluffy way for me [+my team] to see that)
http://blog.jonathanoliver.com/cqrs-sagas-with-event-sourcing-part-i-of-ii/ for example, raises the point that Sagas should not contain 'business logic' (anything that contains 'if' is business logic according to the post).
https://msdn.microsoft.com/en-us/library/jj591569.aspx Talks about Sagas as 'Process managers' which coordinate things between different Aggregates (remember that Aggregate1 can't talk to Aggregat2 directly, so a 'Process manager' is required to orchestrate the communication). To put it simply: Event -> Saga -> Command -> Event -> Saga... To reach the final destination.
https://lostechies.com/jimmybogard/2013/03/21/saga-implementation-patterns-variations/ Talks about two different patterns of what a Saga is. One is 'Publish-gatherer' which basically coordinates what should happen based on a Command. The other is 'Reporter', which just reports the status of things to where they need to go. It doesn't coordinate things, it just reports whatever it needs to report.
http://kellabyte.com/2012/05/30/clarifying-the-saga-pattern/ Has a write-up of what the Saga-pattern 'is'. The claim is that Sagas should/could compensate for different workflows that break.
http://cqrs.nu/Faq/sagas Has a very short description on what Sagas are and basically says 'They are state machines that lets aggregates react to other aggregates'.
So, given that, what is it you actually want the Saga to do? Should it coordinate everything? Or should it just react and not care what the Aggregates do?
My edited part
So, after taking the course on CQRS and talking with Greg about this, I've come to the conclusion that there is quite a lot of confusion out there on the web.
Lets start with just the concept 'Saga'. A Saga has actually nothing to do with CQRS. It's not a concept of it. 'Saga' a form of a two-phase-commit, only it's optimised for success rather than fail ( https://en.wikipedia.org/wiki/Compensating_transaction )
Now, what most people mean when they talk CQRS and say "Saga" is "Process Manager". And process managers are quite complicated it seems (Greg has a whole other course for just Process Managers).
Basically what they do is the manage the whole process of something (as the name suggests). The link to Microsoft is pretty much what it's all about.
To answer the question:
Where exactly I have to initiate the SagaManager with repository, Is it in RestController or in CommandHandler ?
Outside of them both. A Process Manager is it's own thing. It spans aggregates and repositories. Conceptually it might be better to look at it as a user doing all the things you want the PM do to, just that you program the users interaction and tell it what to listen for.
Disclaimer: I do not work for Greg, or anyone that stands to gain on my promotion for taking his courses. It's just that I learned a lot from it, so I recommend it just like I would recommend reading Eric Evans book on DDD.

In my application i've build Saga process manager using this MSDN documentation, my Saga is implemented in Application Service layer, it listens Events of Sales, Warehouse & Billing bounded contexts and on event occurrence sends Commands via Service Bus.
Simple example, hope it helps you to analyze how to build your saga (I am registering saga as handler in Composition Root) ;):
SAGA:
public class SalesSaga : Saga<SalesSagaData>,
ISagaStartedBy<OrderPlaced>,
IMessageHandler<StockReserved>,
IMessageHandler<PaymentAccepted>
{
private readonly ISagaPersister storage;
private readonly IBus bus;
public SalesSaga(ISagaPersister storage, IBus bus)
{
this.storage = storage;
this.bus = bus;
}
public void Handle(OrderPlaced message)
{
// Send ReserveStock command
// Save SalesSagaData
}
public void Handle(StockReserved message)
{
// Restore & Update SalesSagaData
// Send BillCustomer command
// Save SalesSagaData
}
public void Handle(PaymentAccepted message)
{
// Restore & Update SalesSagaData
// Send AcceptOrder command
// Complete Saga (Dispose SalesSagaData)
}
}
InMemorySagaPersister: (as SalesSagaDataID i am using OrderID its unique across whole process)
public sealed class InMemorySagaPersister : ISagaPersister
{
private static readonly Lazy<InMemorySagaPersister> instance = new Lazy<InMemorySagaPersister>(() => new InMemorySagaPersister());
private InMemorySagaPersister()
{
}
public static InMemorySagaPersister Instance
{
get
{
return instance.Value;
}
}
ConcurrentDictionary<int, ISagaData> data = new ConcurrentDictionary<int, ISagaData>();
public T GetByID<T>(int id) where T : ISagaData
{
T value;
var tData = new ConcurrentDictionary<int, T>(data.Where(c => c.Value.GetType() == typeof(T))
.Select(c => new KeyValuePair<int, T>(c.Key, (T)c.Value))
.ToArray());
tData.TryGetValue(id, out value);
return value;
}
public bool Save(ISagaData sagaData)
{
bool result;
ISagaData existingValue;
data.TryGetValue(sagaData.Id, out existingValue);
if (existingValue == null)
result = data.TryAdd(sagaData.Id, sagaData);
else
result = data.TryUpdate(sagaData.Id, sagaData, existingValue);
return result;
}
public bool Complete(ISagaData sagaData)
{
ISagaData existingValue;
return data.TryRemove(sagaData.Id, out existingValue);
}
}

One approach might be to have some sort of starting command that starts the Saga. In this scenario it would be configured in your composition root to listen to a certain command type. Once a command has been received in your message dispatcher (or whatever middleware messaging stuff you have) it would look for any Sagas that have been registered to be started by the command. You would then create the Saga and pass it the command. It could then react to other commands and events as they happen.
In your scenario I would suggest your Saga is a type of command handler so the initiation of it would be upon receiving a command

How to correctly saving the viewmodel of page to handle tombstoning

I'm building a WP7 app, and I'm now at the point of handling the tombstoning part of it.
What I am doing is saving the viewmodel of the page in the Page.State bag when the NavigatedFrom event occurs, and reading it back in the NavigatedTo (with some check to detect whether I should read from the bag or read from the real live data of the application).
First my VM was just a wrapper to the domain model
public string Nome
{
get
{
return _dm.Nome;
}
set
{
if (value != _dm.Nome)
{
_dm.Nome= value;
NotifyPropertyChanged("Nome");
}
}
}
But this didn't always work because when saving to the bag and then reading back, the domain model was not deserialized correctly.
Then I changed my VM implementation to be just a copy of the properties I needed from the DM:
public string Nome
{
get
{
return _nome;
}
set
{
if (value !=nome)
{
_nome= value;
NotifyPropertyChanged("Nome");
}
}
}
and with the constructor that does:
_nome = dm.Nome;
And now it works, but I was not sure if this is the right approach.
Thx
Simone

Any transient state information should be persisted in the Application.Deactivated event and then restored in the Application.Activated event for tombstoning support.
If you need to store anything between application sessions then you could use the Application.Closing event, but depending on what you need to store, you could just store it whenever it changes. Again, depending on what you need to store, you can either restore it in the Application.Launching event, or just read it when you need it.
The approach that you take depends entirely on your application's requirements and the method and location that you store your data is also up to you (binary serialization to isolated storage is generally accepted is being the fastest).
I don't know the details of your application, but saving and restoring data in NavigatedFrom/NavigatedTo is unlikely to be the right place to do it if you are looking to implement support for tombstoning.

I'd recommend against making a copy of part of the model as when tombstoning you'd (probably) need to persist both the full (app level) model and the page level copy when handling tombstoning.
Again the most appropriate solution will depend on the complexity of your application and the models it uses.

Application.Activated/Deactivated is a good place to handle tombstoning.
See why OnNavigatedTo/From may not be appropriate for your needs here.
How to correctly handle application deactivation and reactivation - Peter Torr's Blog
Execution Model Overview for Windows Phone

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse