Kafka 0.8.2 custom API libraries? - apache-kafka

I am using Kafka queue 0.8.2 and have implemented standard poll and push calls. Now wanna go to pollByIndex methods that require implementation of Simple Consumer.
Does somebody knows some custom library which already deals with methods like this since implementing Simple Consumer can be a lot of work :)
Upgrading to 0.9 to use ConsumerAPI not option yet for me.

Ok. So creating SimpleConsumer is not really easy. So I went for solution of implementing another command that creates different group each time I queue for results which is stored in zookeeper and then assign offset I need on topic. That way default streaming queue will not be destroyed, it create small load on zookeeper but nothing significant.
Also after every querying I take care to remove group from zookeeper to keep it clean.
1 day of coding more less.
With new versions 0.9 and newer we will have ConsumerAPI which has its own calls also for this situation. Until then more work from dev side needed.

Related

Axon Server auto-scaling split/merge delay

I am implementing auto-scaling in an application using Axon Server, and running in k8s.
I have created ReST endpoints in the application itself, which look at the local configuration (for processors and thread counts) and then speak to the Axon Server ReST API in order to split/merge the processors appropriately. The intent being to use container lifecycle hooks to trigger them.
As a result, if a new instance (pod) of an application is launched, configured for 2 threads on ProcessorA, then my code will make 2 requests to the /v1/components/blah/processors/ProcessorA/segments/split?context=default endpoint on the server. This is in order to make full use of the 2 new threads.
Likewise, when the pod is shut down, it makes 2 similar requests to the merge endpoint on the server.
When scaling up I see the processor split twice, as expected. However, on shutdown I don't see the merge twice unless I put a long (5s) wait between requests. This isn't likely to be particularly stable, so I'm wondering if there's something else I need to be doing.
Perhaps I ought to request the merge, then loop waiting for it to occur, then request another. This seems like it's going to be excessively slow.
There was another question on SO somewhat related, Automatically scale Axon's tracking event processors, where Steven commented that there was no inbuilt auto-scaling in Axon Server at that point in time. I've not seen anything in more recent times either.
As it stands work is underway to improve the split/merge functionality. For one, the result of a split/merge will be returned, which has been resolved under issue #1001.
This should make it so you do not have to wait for the status' to have been updated, which is the likely cause why it (seems to) take long. This functionality will be part of Axon Framework / Server 4.4 by the way, which should be released relatively soon.
Subsequently, discussion are still underway to allow for auto scaling. One requirement deemed important is the capability of a TrackingEventProcessor to process several segments per thread (issue #1434). This will ensure that the TEP can take over several segments to transition the boundary when scaling, for example.
Eventually though, Axon Server should be able to do this for you. It's just not there yet.
So for now I think the most pragmatic solution is indeed to wait for the result to show up on the status'. As said, I trust 4.4 will improve upon this by returning the result of the split/merge operation once called. Lastly, the Axon team is aware this can be improved upon further, hence why discussion on the matter are underway.

How can I get a list of eventprocessor in axon 3.1.1

I am using Axon 3.1.1 and wanted to know,
How can I get a list of eventprocessor in my configuration file,
I went through the springAmQPmessageSource file but still not sure how to exactly do it.
So that I can pass my event to appropriate eventhandler on Query side.
List<Consumer<List<? extends EventMessage<?>>>> eventProcessors = new CopyOnWriteArrayList<>();
Updated
I was retrieving message from kafka topic and wanted to wire them to specific eventhandler but since I am not able to get evenprocessors, I am not able to do that.
Can you please tell me how to do it, if I am using Axon 3.0.5
If you're using the SpringAmqpMessageSource, you will not need to retrieve the list of eventProcessors you've shared, as Axon will automatically subscribe all the event handling components to it for you.
Subsequently, the events the Message Source receives will automatically be pushed to all the listeners in your query side.
As this is all covered as Axon infrastructure under the hood, there is no one-off way to pull them out of it for your own use (other than potentially wiring them yourself).
Hence, you shouldn't have to do this yourself.
But, maybe I'm missing an obvious point here.
Could you elaborate a little more why you need the list of handlers in the first place?

How to maintain state after streaming application restart?

I am trying to understand how state management in Spark Streaming works in general. If I run this example program twice will the second run see state from the first run?
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/StatefulNetworkWordCount.scala
Is there a way how to achieve this? I am thinking about redeploying an application an I would like not to loose the current state.
tl;dr It depends on what you need the other instance to see. Checkpointing is usually a solution.
ssc.checkpoint(".") (at the line 50 in StatefulNetworkWordCount) enables checkpointing that (quoting the official documentation):
Spark Streaming needs to checkpoint enough information to a fault-tolerant storage system such that it can recover from failures.
A failure can be considered a form of redeployment. It is described in the official documentation under Upgrading Application Code that lists two cases:
Two instances run in parallel
One is gracefully brought down, and the other reads state from checkpoint directory.

Is my middle-tier MSMQ queue really necessary?

My scenario is this:
I have multiple webservers that:
need to communicate with the backend (IBus.Publish/IBus.Subscribe)
need to communicate with each-other (IBus.Publish/IBus.Subscribe)
Aside from the webservers, I have a number of windows services that consume the same messages.
In order to make this work, I have the webservers send messages to a central hub, which sole responsebility it is to wrap the message in a new message type and publish it to all subscribers.
Can I somehow avoid this, so I can publish the messages directly from the webservers?
EDIT (Added some code) - Current situation:
... WebServer
_bus.Send(new Message{Body="SomethingChanged"});
... Hub
public void Handle(Message message){
_bus.Publish(new WrappedMessage{Message = message})
}
... Handlers (WebServers, WindowsServices etc)
public void Handle(WrappedMessage message){
//Actually do important stuff
}
Wanted situation:
... WebServer
_bus.Publish(new Message{Body="SomethingChanged"};
... Handlers (WebServers, WindowsServices etc)
public void Handle(Message message){
//Do important stuff
}
Well, there isn't anything that technically prevents you from publishing messages inside your web application, and likewise there's nothing that prevents you from subscribing to those messages in all instances of the same web application. The question is whether you should :)
Without knowing the details of your problem, my immediate feeling is that you would be better off using some kind of shared persistent storage for whatever it is that you're trying to synchronize (a cache?), possibly using some kind of read replication if you'd like to scale out and make reads really fast.
Again, without knowing the details of your problem, I'll try and suggest something, and then you can see if that could inspire you into an even better solution... here goes:
Use MongoDB (possible as a replica set if you want to scale out your read operations) as the persistent storage of the thing you're caching
Whenever something happens in the web application, bus.Send a message to your backend
In your backend message handler, you update Mongo (which automatically will replicate to read slaves)
Whenever you need to query your data, you just query your Mongo set (using slaveOk=true whenever you can accept slightly stale values)
The reason I'm suggesting this alternative solution, is that web applications (at least in .NET land) have this funny transient nature where the IIS will dictate its lifecycle, and at any given time you can have n instances of it. This complicates matters if you keep state in it. This makes me think of the web application as a client, not a publisher.
A simpler solution is to keep state in something that does not come & go, e.g. a database. And the reason I'm suggesting Mongo is that my guess is that you're worried about being able to serve web requests fast, but since MongoDB is fairly easy to install as a replica set where read operations will be pretty fast (and, more importantly: horisontally scaleable), my guess is that this setup would make everything much simpler.
How does that sound?

CQRS/EventStore: How are failures to deliver events handled?

Getting into CQRS and I understand that you have commands (app layer) and events (from the domain).
In the simple case where events are to update the read model, do read model updates fail? If there is no "bug" then I cannot see them failing and as I am using EventStore, I know there is a commit flag which will retry failures.
So my question is do I have to do anything in addition to EventStore to handle failures?
Coming from a world where you do everything in one transaction and now things are done separately is worrying me.
Of course there may be cases where a published event will fail in the read models.
You have to make sure you can detect that and solve it.
The nice thing is that you can replay all the events again and again so you have the chance not only to fix the error. You can also test the fix by replaying every single event if you want.
I use NServiceBus as my publishing mechanism which allows me to use an error queue. Using my other logging tools together with the error queue I can easily determine what happened since I have the error log and the actual message that caused the error in the first place.