Storm results visualization - visualization

I've spent hours to find the best way to visualize the results of my Storm system. It seems that there is an infinite combination of technologies and I'm getting completely lost.
I want to avoid the use of a database so from what I have understood my system should have the following features:
a queuing message system (such as Redis, Kafka, ActiveMQ,...) that could be connected to my bolts.
a server that establishes a websocket connection with the browser and stream the messages to it.
a javascript library that updates the front end in real-time.
Could you please correct me if I'm saying something wrong regarding the architecture? And I also would appreciate to know which combination of technologies is the best.

As #Lan said, your question is too wide.
For the minimal use I personnally use Redis and Storm together as redis can be used as a basic queue (beware of persistence problems with redis and clustering if you have to), a shared memory space for storm bolts/spouts (storing configuration, intermediate results...) and a basic message broker (pub/sub support), it also has very good performance in latency and throughput.
You can then use a "classic" backend to plug redis topics to websockets using for instance nodejs with sockjs and redis client, but there are far more solutions for this problematic in many languages.
For the front part, it should be defined by your server choice (for instance sockjs-client or socket.io with nodejs), as fallback strategies are embedded when websockets are not supported in browsers.
To conclude, the best architecture is the one that fits your usage, so it depends.

There are so many ways. I recently built a demo using Apache Storm + Kafka. For visualization I used JQuery --> Node.js (for restful web service) --> Redis. This is just one example. There are so many other combinations that you can consider based on your use case.

Related

Publish to Apache Kafka topic from Angular front end

I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.

Akka.Net work queues

I have an existing distributed computing framework built on top of MassTransit and RabbitMQ. There is essentially a manager which responds with work based on requests. Each worker will take a certain amount of items based on the physcial machine specs. The worker then sends completion messages when done. It works rather well and seems to be highly scalable since the only link is the service bus.
I recently evaluated Akka.Net in order to see if that would be a simpler system to implement the same pattern. After looking at it I was somewhat confused at what exactly it is used for. It seems that if I wanted to do something similar the manager would have to know about each worker ahead of time and directly send it work.
I believe I am missing something because that model doesn't seem to scale well.
Service buses like MassTransit are build as reliable messaging services. Ensuring the message delivery is primary concern there.
Actor frameworks also use messages, but this is the only similarity. Messaging is only a mean to achieve goal and it's not as reliable as in case of the service buses. They are more oriented on building high performance, easily distributed system topologies, centered around actors as primary unit of work. Conceptually actor is close to Active Record pattern (however this is a great simplification). They are also very lightweight. You can have millions of them living in memory of the executing machine.
When it comes to performance, Akka.NET is able to send over 30 mln messages/sec on a single VM (tested on 8 cores) - a lot more than any service bus, but the characteristics also differs significantly.
On the JVM we now that akka clusters may rise up to 2400 machines. Unfortunately we where not able to test, what the .NET implementation limits are.
You have to decide what do you really need: a messaging library, an actor framework or a combination of both.
I agree with #Horusiath answer. In addition, I'd say that in most cases you can replace a servicebus for the messaging system of an actor model like akka, but they are not in the same class.
Messaging is just one thing that Akka provides, and while it's a great feature, I wouldn't say it's the main one. When analyzing it as an alternative, you must first look at the benefits of the model itself and then look if the messaging capabilities are good enough for your use case. You can still use a dedicated external servicebus to distribute messages across different clusters and keep akka.net exchanging messages inside clusters for example.
But the point is that if you decide to use Akka.net, you won't be using it only for messaging.

Is Kafka ready for production use?

I have an application in production that has to process several gigabytes of messages per day. I like the Kafka architecture and performance a lot; it perfectly fits my needs.
I'd like to replace my messaging layer with Kafka at some point. Is the 0.7.1 version good enough for production use in terms of stability and consistency in performance?
It is definitely in use at several Big Data companies already, including LinkedIn, where it was created (and later open sourced), and Tumblr. Just Tumblr by itself handles many gigabytes of messages per day. I'm sure LinkedIn is way up there too. You can see a list of companies known to currently use it here:
https://cwiki.apache.org/confluence/display/KAFKA/Powered+By
Also, be sure to subscribe to their mailing list, there are lots of people actively trying it out and using it in production environments.
I'm sure it can handle whatever volume you can throw at it.
There is one critical feature I think Kafka is missing before it is ready for production.
"Flushing messages to disc if the producer can't reach any Kafka broker"
The issue has been filed a long time ago here:
https://issues.apache.org/jira/browse/KAFKA-156
This feature will makes the complete Kafka event pipline even more robust for some use-cases when the producer always has to be able to send events. For example when you track pageviews or like-button clicks and you don't want to miss any events, even if all Kafka brokers are unreachable.
I must agree with Dave, Kafka is a good tool but it missing some basic features which some can be done manually but then you need to think what Kafka provide. some missing things are:
(As Dave said) Flushing messages to disk when the producer fail to send them
Consumers ability to track which messages were handled (not just consumed) and which wasn't in case of a restart.
Monitoring - a way to receive the current status of the entities in the system like the current size of the queue in the producer or the write\read pace at the brokers (those can be done but are not part of the tool).
I have used kafka for quite sometime. Using native java and python clients would be preferred.
I had to struggle a lot finding a proper node.js client. literally re-wrote my whole code many a times using different clients as they had lot of bugs.
Finally settled with franz-kafka for node.js.
Apart from that maintaining the consumer offsets is a bit difficult. It is missing some good features like exchanges that exist in AMQP based Apache Qpid or RabbitMQ
Since it's distributed, supports offline messages and the performance is really impressive. I too preferred it :)

Messaging, Queues and ESB's - I know where I want to be but not how to get there

To cut a long story short, I am working on a project where we are rewriting a large web application for all the usual reasons. The main aim of the rewrite is to separate this large single application running on single server into many smaller decoupled applications, which can be run on many servers.
Ok here's what I would like:
I would like HTTP to be the main transport mechanism. When one application for example the CMS has been updated it will contact the broker via http and say "I've changed", then the broker will send back a 200 OK to say "thanks I got the message".
The broker will then look on its list of other applications who wanted to hear about CMS changes and pass the message to the url that the application left when it told the broker it wanted to hear about the message.
The other applications will return 200 OK when they receive the message, if not the broker keeps the message and queues it up for the next time someone tries to contact that application.
The problem is I don't even know where to start or what I need to make it happen. I've been looking at XMPP, ActiveMQ, RabbitMQ, Mule ESB etc. and can see I could spend the next year going around in circles with this stuff.
Could anyone offer any advice from personal experience as I would quite like to avoid learning lessons the hard way.
I've worked with JMS messaging in various software systems since around 2003. I've got a web app where the clients are effectively JMS topic subscribers. By the mere act of publishing a message into a topic, the message gets server-pushed dissemenated to all the subscribing web clients.
The web client is Flex-based. Our middle-tier stack consist of:
Java 6
Tomcat 6
BlazeDS
Spring-Framework
ActiveMQ (JMS message broker)
BlazeDS has ability to be configured as a bridge to JMS. It's a Tomcat servlet that responds to Flex client remoting calls but can also do message push to the clients when new messages appear in the JMS topic that it is configured to.
BlazeDS implements the Comet Pattern for doing server-side message push:
Asynchronous HTTP and Comet architectures
An introduction to asynchronous, non-blocking HTTP programming
Farata Systems has announced that they have modified BlazeDS to work with the Jetty continuations approach to implementing the Comet Pattern. This enables scaling to thousands of Comet connections against a single physical server.
Farata Systems Achieves Performance Breakthrough with Adobe BlazeDS
We are waiting for Adobe to implement support of Servlet 3.0 in BlazeDS themselves as basically we're fairly wedded to using Tomcat and Spring in combo.
The key to the technique of doing massively scalable Comet pattern is to utilize Java NIO HTTP listeners in conjunction to a thread pool (such as the Executor class in Java 5 Concurrency library). The Servlet 3.0 is an async event-driven model for servlets that can be tied together with such a HTTP listener. Thousands (numbers like 10,000 to 20,000) concurrent Comet connections can then be sustained against a single physical server.
Though in our case we are using Adobe Flex technology to turn web clients into event-driven messaging subscribers, the same could be done for any generic AJAX web app. In AJAX circles the technique of doing server-side message push is often referred to as Reverse AJAX. You may have caught that Comet is a play on words, as in the counterpart to Ajax (both household cleaners). The nice thing for us, though, is we just wire together our pieces and away we go. Generic AJAX web coders will have a lot more programming work to do. (Even a generic web app could play with BlazeDS, though - it just wouldn't have any use for the AMF marshaling that BlazeDS is capable of.)
Finally, Adobe and SpringSource are cooperating on establishing a smoother, out-of-the-box integration of BlazeDS in conjunction to the Spring-Framework:
Adobe Collaborates with SpringSource for Enhanced Integration Between Flash and SpringSource Platforms
First of all, don't worry about ESBs. The situation you've described lies well within the bounds of straightforward message-oriented middleware. You only "need" an ESB if you're doing things like mediations, content-based routing, protocol transformations; things where the middleware does stuff to the message, on top of routing it to the right place.
If you have a diverse set of destination applications that need to speak to each other - and it sounds like you do - you're right that messaging over a language agnostic protocol (like XMPP, STOMP or HTTP) is a neat solution. It basically means you don't have to write and run loads of Java daemons to translate messages into your favourite flavour of JMS.
STOMP is increasingly supported by message brokers, especially by the open-source ones, and there's a number of different client libraries. It is a lightweight protocol, specifically designed for messaging so you get a much richer feature set out of the box than you would with HTTP.
For me, XMPP is a bit of a weak option as it's not so well supported on the server side, although it is fun to be able to IM your broker :)
If you are set on HTTP, OpenMQ is very good, and I've personally used its Universal Message Service - basically a webapp wrapper around JMS destinations. It provides a REST-ful interface, with a similar set of verbs as STOMP provides.
As someone has already said, what your describing is basically the Publisher/Subscribe Model. This is very easily achieved using either an ESB or a message queue. I have had some experience with RabbitMQ. Its very good. Nothing gets lost and it deals with the publish subscribe model very well. I have in the past gone down the route on small scale systems of developing my own Message broker with a bespoke protocol over http. I wouldn't advise this, reason being is that as you start to develop it you keep thinking of ways of how to extend it.
RabbitMQ is developed in Erlang but it has java,net,python etc clients that can hook into it very easily. I have used the .net and python clients, it works well. I chose it for Erlangs reputation for creating solid systems that can cope with multiple things going on at the same time, very well. I would call them threads but I think that its smarter than just threads, I think I remember mutterings of the Actor Model and mailboxes, which I recall were pretty neat.
I was in a similar position as yourself but with very bad experiences of other messaging systems (Biztalk et al.) that were too propriety that tied you into a solution. If you can keep the messages separate from the transport and delivery mechanisms, then you can develop your system to your hearts content. I used JSON in the end as the packet sizes are small. You could use anything you like, some opt for SOAP messages, but I feel that these are way too heavy for most stuff, although it does allow you to nicely give XSD schemas to outsiders so that they can/could develop systems that interop with your system in the future.
http://www.rabbitmq.com/tutorials/tutorial-three-java.html, this is a link to the tutorial on the Publish/Subscribe model and how you would achieve it using a message queue system. Its for rabbitMQ, but to be honest it will work with ESB and any other Messaging queue system out there.
ESB (Enterprise Serial Bus) - Consider this when your application have much interaction with two or more external/separate applications where each of these won't communicate in a similar data format. Ex: Some systems may accept objects, XML, JSON, SMTP, TCP/IP, HTTP, HTTPS etc.
ESB has many features like:
Routing,Addressing,Messaging styles,Transport protocols,Service messaging model.
Consider queue system if the producer - consumer applications follows the same type of data format.
Web services (SOAP / REST) is best if one application need the other application to complete the work flow.
Use Queues if the application need asynchronous data transfer.
You're really talking about publish and subscribe with assured delivery. Most MOM software should easily support your use case.
As it was already said earlier, having an ESB for you current case seems to me like to smash a fly with a hammer.
The ESB software itself will be time consuming and will require maintenance. If you go to open source solution, it might be more time consuming than using a licensed solution (IBM, ORACLE, ...).
Of course an ESB would do the job, and it would be really easy to develop a solution, but setting up an ESB would be way more difficult than doing the solution itself.
If your problem is limited to the case described, I would highly suggest you to build a simple architecture over OpenMQ (or similar), and using it through JMS

Multicasting, Messaging, ActiveMQ vs. MSMQ?

I'm working on a messaging/notification system for our products. Basic requirements are:
Fire and forget
Persistent set of messages, possibly updating, to stay there until the sender says to remove them
The libraries will be written in C#. Spring.NET just released a milestone build with lots of nice messaging abstraction, which is great - I plan on using it extensively. My basic question comes down to the question of message brokers. My architecture will look something like app -> message broker queue -> server app that listens, dispatches all messages to where they need to go, and handles the life cycle of those long-lived messages -> message broker queue or topic -> listening apps.
Finally, the question: Which message broker should I use? I am biased towards ActiveMQ - We used it on our last project and loved it. I can't really think of a single strike against it, except that it's Java, and will require java to be installed on a server somewhere, and that might be a hard sell to some of the people that will be using this service. The other option I've been looking at is MSMQ. I am biased against it for some unknown reason, and it also doesn't seem to have great multicast support.
Has anyone used MSMQ for something like this? Any pros or cons, stuff that might sway the vote one way or the other?
One last thing, we are using .NET 2.0.
I'm kinda biased as I work on ActiveMQ but pretty much all of benefits listed for MSMQ above also apply to ActiveMQ really.
Some more benefits of ActiveMQ include
great support for cross language client access and multi protocol support
excellent support for enterprise integration patterns
a ton of advanced features like exclusive queues and message groups
The main downside you mention is that the ActiveMQ broker is written in Java; but you can run it on IKVM as a .net assembly if you really want - or run it as a windows service, or compile it to a DLL/EXE via GCJ. MSMQ may or may not be written in .NET - but it doesn't really matter much how its implemented right?
Irrespective of whether you choose MSMQ or ActiveMQ I'd recommend at least considering using the NMS API which as you say is integrated great into Spring.NET. There is an MSMQ implementation of this API as well as implementations for TibCo, ActiveMQ and STOMP which will support any other JMS provider via StompConnect.
So by choosing NMS as your API you will avoid lockin to any proprietary technology - and you can then easily switch messaging providers at any point in time; rather than locking your code all into a proprietary API
Pros for MSMQ.
It is built into Windows
It supports transactions, it also supports queues with no transactions
It is really easy to setup
AD Integration
It is fast, but you would need to compare ActiveMQ and MSMQ for your traffic to know which is faster.
.NET supports it nativity
Supports fire and forget
You can peek at the queue, if you have readers that just look. not sure if you can edit a message in the queue.
Cons:
4MB message size limit
2GB Queue size limit
Queue items are held on disk
Not a mainstream MS product, docs are a bit iffy, or were it has been a few years since I used it.
Here is a good blog for MSMQ
Take a look at zeromq. It's one of the fastest message queues around.
I suggest you have a look at TIBCO Enterprise Messaging Service - EMS, which is a high performance messaging product that supports multicasting, routing, supports JMS specification and provides enterprise wide features including your requirements suchas fire-forget and message persistence using file/database using shared state.
As a reference, FEDEX runs on TIBCO EMS
as its messaging infrastructure.
http://www.tibco.com/software/messaging/enterprise_messaging_service/default.jsp
There are lot other references if i provide, you'd really be surprised.
There are so many options in that arena...
Free: MantaRay a peer to peer fully JMS compliant system. The interesting part of Mantaray is that you only need to define where the message goes and MantaRay routes it anyways that will get your message to it's detination - so it is more resistant to failures of individual nodes in your messaging fabric.
Paid: At my day job I administer an IBM WebSphere MQ messaging system with several hundred nodes and have found it to be very good. We also recently purchased Tibco EMS and it seems that it will be pretty nice to use as well.