How to read Apache Storm topology Results - visualization

I'm new in Apache Storm and I installed in my Windows 8 successfully. I started the topology WordCount locally to test it and it worked fine. However, I want to know what are the words? and what are their counts exactly? and so, Where can I read such a result?
Also, is there a possibility to visualize these results?
Please help..

Related

Integration tests for spring kafka producer and consumer

I want to write tests for spring kafka producer and consumer. I have tried multiple ways:
EmbeddedKafka annotation
EmbeddedKafkaRule
EmbeddedKafkaBroker
etc...
Every time I get one or the other error and all the examples posted on GitHub don't seem to run at all. I checked the spring kafka versions for compatibility as well.
Can someone share an example code base that was recently written and has seen it run successfully?
There are hundreds of tests in the framework itself.
This is probably the most extensive one...
https://github.com/spring-projects/spring-kafka/blob/1b9a9451feea7cca16903f1c990c74c6be9b8ffb/spring-kafka/src/test/java/org/springframework/kafka/annotation/EnableKafkaIntegrationTests.java#L164-L176

Apache kylin and PostgreSQL

I’m a student and i’m working on my last year project, the project is about Data warhousing, BI, etc...
So Im asked to work with Apache Kylin
I did some researchs about it, learned some
And I looked for if it is possible to use a PostgreSQL as Data warehouse and make it communicate with Apache Kylin to build cubes
But found nothing...
So would you please answer to my following question:
Is it possible to make the apache kylin communicate with a postgreSQL DWH?
And if there is some hidden documentations about it would you please share it?
Time is running guys and i really appreciate your answers and guides
Thanks in advance.
Khalil
It's doable. Kylin provides data source adapter for JDBC data sources. PostgreSQL could be one of the data source adapters. MySQL is supported by default. You could check this link to learn more: http://kylin.apache.org/development/datasource_sdk.html

Upgrading Kafka client from 0.8.2.0 to 0.11.0.0

Currently, at my company we are migrating from Kafka 0.8 to 0.11, brokers migration steps and clearly stated in kafka documentation here
What I am stuck in is, upgrading the kafka clients (producers, consumers, spark-streaming), I don't find any documentation/ articles listing out clearly what are the required changes or steps to follow to upgarde the client, all what I found is the java doc Producer Client
What I did so far is to change the kafka client version in my gradle to kafka-clients-0.11.0.0, and everything from the compilation point of view went fine with no code changes at all.
What I seek help with is, is there any expected problems I should take care of, any pointers for client changes other than the kafka-client version?
I went through lots of experiments to get this done.
For the consumers and producers, I just used the kafka consumers and producers 0.11.0.
The trick part was replacing spark-streaming, spark-streaming latest version only support upto kafka 0.10.X, which doesn't contains any updates related to the new broker.
What I recommend here, if you are about to write an application from scratch and your main goal is realtime streaming go for kafka-streaming API, it is just AWESOME!, if you already have spark streaming app (which was my case), you should either judge which is more important than the other wether to get stuck with the kafka-broker version 10.X and spark-streaming which was [experimental][1] btw.
The benefits of having the streaming inside kafka not spark the following:
Kafka streaming is a normal jar that can be injected in any java application, so you don't care that much about deployment, and environment
Auto-scaling is so easy when using kafka-streaming using any scaleset provided by any cloud service provider, unlike scaling a HDP cluster.
Monitoring using something like prometheus would be much easier.

Can Eclipse/IntelliJ Idea be used to execute code on the cluster

Production system : HDP-2.5.0.0 using Ambari 2.4.0.1
Aplenty demands coming in for executing a range of code(Java MR etc., Scala, Spark, R) atop the HDP but from a desktop Windows machine IDE.
For Spark and R, we have R-Studio set-up.
The challenge lies with Java, Scala and so on, also, people use a range of IDEs from Eclipse to IntelliJ Idea.
I am aware that the Eclipse Hadoop plugin is NOT actively maintained and also has aplenty bugs when working with latest versions of Hadoop, IntelliJ Idea I couldn't find reliable inputs from the official website.
I believe the Hive and HBase client API is a reliable way to connect from Eclipse etc. but I am skeptical about executing MR or other custom Java/Scala code.
I referred several threads like this and this, however, I still have the question that is any IDE like Eclipse/Intellij Idea having an official support for Hadoop ? Even the Spring Data for Hadoop seems to lost traction, it anyways didn't work as expected 2 years ago ;)
As a realistic alternative, which tool/plugin/library should be used to test the MR and other Java/Scala code 'locally' i.e on the desktop machine using a standalone version of the cluster ?
Note : I do not wish to work against/in the sandbox, its about connecting to the prod. cluster directly.
I don't think that there is a genereal solution which would work for all Hadoop services equally. Each solution has it's own development, testing and deployment scenarios as they are different standalone products. For MR case you can use MRUnit to simulate your work locally from IDE. Another option is LocalJobRunner. They both allow you to check your MR logic directly from IDE. For Storm you can use backtype.storm.Testing library to simalate topology's workflow. But they all are used from IDE without direct cluster communications like in case wuth Spark and RStudio integration.
As for the MR recommendation your job should ideally pass the following lifecycle - writing the job and testing it locally, using MRUnit, then you should run it on some development cluster with some test data (see MiniCluster as an option) and then running in on real cluster with some custom counters which would help you to locate your malformed data and to properly maintaine the job.

openshif cloudcomputing configuration,is possible complete on cloud?

Is possible build a bigdata application on cloud with RED HAT'PaaS OpenShift? I'm looking how build on cloud an Scala Application with Hadoop (HDFS),Spark,an Apache Mahout but i can't find any thing about it.I've seen something with HortonWorks but nothing clear about how install it in an openshift environment an how add HDFS node in Cloud too.Is it possible with OpneShift?
It's possible in Amazon but my question is : IS possible in OpenShift ??
It really depends on what you're ultimately trying to achieve. I know you mention building a big data application on Openshift with Scala but what will the application ultimately be doing?
I've gotten Hadoop running in a gear before but if you want a better example check out this quickstart here to get an idea of how its done https://github.com/ryanj/flask-hbase-todos. I know its not scala but here's a good article that will show you how to put together a scala app https://www.openshift.com/blogs/building-distributed-and-event-driven-applications-in-java-or-scala-with-akka-on-openshift.
What will the application ultimately be doing?:
Forecasting for football match result for several football leagues,a web application (ruby) and
statistic computation and data mining ,calculations with Scala language
and apache frameworks(spark & mahout).
We get the info via CSV files, process and save it in nosql db (Cassandra).
And all of this on cloud(OpenShift),that's the idea.
I've seen the info https://github.com/ryanj/flask-hbase-todos.I'll try by this way but
with Scala.