How to run Apache Storm in Single Node on Windows OS - streaming

How to run Apache Storm in Single Node on Windows OS? Can anyone provide a link for that?

Install Java
Download and install a JDK (Storm works with both Oracle and OpenJDK 6/7). For this setup I used JDK 7 from Oracle.
I installed Java in:
C:\Java\jdk1.7.0_45\
Install Python
To test the installation, we’ll be deploying the “word count” sample from the storm-starter project which uses a multi-lang bolt written in python. I used python 2.7.6 which can be downloaded here.
I installed python in:
C:\Python27\
Install and Run Zookeeper
Download Apache Zookeeper 3.3.6 and extract it. Configure and run Zookeeper with the following commands:
> cd zookeeper-3.3.6
> copy conf\zoo_sample.cfg conf\zoo.cfg
> .\bin\zkServer.cmd
Install Storm
The changes that allow Storm to run seamlessly on Windows have not been officially released yet, but you can download a build with those changes incorporated here.
(Source branch for that build can be found here).
Extract that file to the location of your choice. I chose C:.
Configure Environment Variables
On Windows Storm requires the STORM_HOME and JAVA_HOME environment variables to be set, as well as some additions to the PATH variable:
JAVA_HOME:
C:\Java\jdk1.7.0_45\
STORM_HOME:
C:\storm-0.9.1-incubating-SNAPSHOT-12182013\
PATH: (add)
%STORM_HOME%\bin;%JAVA_HOME%\bin;C:\Python27;C:\Python27\Lib\site-packages\;C:\Python27\Scripts\;
PATHEXT: (add)
.PY
Start Nimbus, Supervisor, and Storm UI Daemons
For each deamon open a separate command prompt.
Nimbus
cd %STORM_HOME%
storm nimbus
Supervisor
cd %STORM_HOME%
storm supervisor
Storm UI
cd %STORM_HOME%
storm ui
Verify that Storm is running by opening http://localhost:8080/ in a browser.
Deploy the “Word Count” Topology
Either build the storm-starter project from source, or download a pre-built jar
Deploy the Word Count topology to your local cluster with the storm jar command:
storm jar storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar storm.starter.WordCountTopology WordCount -c nimbus.host=localhost

Related

Can Apache Storm run on Windows 10 from within Eclipse?

In trying to learn about Apache Storm, I have been following Unboxing Apache Storm series (https://www.youtube.com/watch?v=QoEyXKIKZKY&list=PLeUBsMTwZBi3rPzowug5-PdGA_cmD9QbE&index=1), with the exception that I am working under Windows rather than an Linux system. For the windows specific installation, I followed https://www.techgeeknext.com/apache/install-apache-storm#google_vignette. I am using Eclipse to set up the main and other classes that contain the toy example in the series (video 7/Creating First Java Project). However, once I try to run the example, I get the error:
Unable to canonicalize address localhost/:2000 because it's not resolvable.
I thought the error was because I had the clientPort in zookeeper.properties and storm.zookeeper.port in storm.yaml set to the default of 2181, so I changed it to 2000, but I still get the same error. I am using JDK 1.8.333, apache storm 1.2.3, eclipse 2022-03 (4.23.0), python 3.9.7, and zookeeper 3.4.14. If anyone has any suggestions, that would be great.

The Pyspark always use the system‘s python

We know that a system has two Python:
①system's python
/usr/bin/python
②user's python
~/anaconda3/envs/Python3.6/bin/python3
Now I have a cluster with my Desktop(master) and Laptop(slave).
It's OK for different mode of PysparkShell if I set like this:
export PYSPARK_PYTHON=~/anaconda3/envs/Python3.6/bin/python3
export PYSPARK_DRIVER_PYTHON=~/anaconda3/envs/Python3.6/bin/python3
for both two nodes' ~/.bashrc
However,I want to configure it with jupyter notebook.So I set like this in each node's
~/.bashrc
export PYSPARK_PYTHON=~/anaconda3/envs/Python3.6/bin/python3
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
then I get the log
My Spark version is:
spark-3.0.0-preview2-bin-hadoop3.2
I have read all the answers in
environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON
and
different version 2.7 than that in driver 3.6 in jupyter/all-spark-notebook
But no luck.
I guess slave's python2.7 is from system's python.not from anaconda's python.
How to force spark's slave node to use anaconda's python?
Thanks~!
Jupiyter is looking for ipython, you probably only have ipython installed in your system python.
In order to use jupyter in different python version. You need to use python version manager (pyenv), and python environment manager(virtualenv), together you can choose which version of python you are going to use and which environment you are going to install jupyter, and fully isolated python versions and packages.
Install ipykernel in your chosen python environment and install jupyter.
After you finish above step. You need to make sure that the Spark worker will switch to your chosen python version and environment every time Spark ReourceManager launches a worker executor. In order to swtich python version and environment when the Spark worker executor, you need to make sure that a little script ran right after the Spark Resource Manager ssh into worker:
go to the python environment directory
source 'whatever/bin/activate'
After you have done above steps, you should have chosen python version and jupyter ran by Spark worker executor.

zookeeper doen's get started due to java.lang.ClassNotFoundException: org.apache.zookeeper.server.quorum.QuorumPeerMain

I searched this forum already, no working answer is found for my case:
I installed java 1.8
I downloaded the binary version of zookeeper-3.5.5 from https://www.apache.org/dist/zookeeper/zookeeper-3.5.5/apache-zookeeper-3.5.5-bin.tar.gz
I changed the zoo_sample.cfg to zoo.cfg, no changes made to the cfg
I tried to start zookeeper as root:
[root#pocdnr1n1 apache-zookeeper-3.5.5-bin]# bin/zkServer.sh start conf/zoo.cfg
I received error in log:
Could not find the main class: org.apache.zookeeper.server.quorum.QuorumPeerMain. Program will exit.
Thanks.
I think I have found the root cause, posting it here for future readers:
The culprit is Java.
I had an old java version on the node, this can be verified by running java -version
In my case the java was 1.6, what I did is to reset the environment to add the new java 1.8 to it:
# export JAVA_HOME=/opt/jdk1.8.0_151
# export JRE_HOME=/opt/jdk1.8.0_151/jre/
# export PATH=$PATH:/opt/jdk1.8.0_151/bin:/opt/jdk1.8.0_151/jre/bin
You should add the PATH to the .bash_profile so that it becomes permanent.
Now after that you run source .bash_profile, now you have set java 1.8 as the default java. again, you can confirm that by running java -version
Run your zookeeper again and it will be started as expected.
The common missings here are:
downloaded the non-binary zookeeper
java is too low (1.6 doesn't work, 1.8 is recommended)
environment should be set to ensure java 1.8 is picked up as default version
I hope this helps.
If you look at ZooKeeper Administrator's Guide - Required Software for 3.5.5 it says:
ZooKeeper runs in Java, release 1.7 or greater (JDK 7 or greater, FreeBSD support requires openjdk7).
which affirms what you found out. Your Java version was too low.
What worked for me was rebuilding with ./gradlew jar -PscalaVersion-2.13.10
I was using kafka straight from source from github.com/apache/kafka and that step was given to me when I cloned a fresh copy of the repo to start from scratch.
I had done a git pull on my old version, so it broke the dependencies, which I forgot I must have at one point installed.

Add Java in Python Flask Cloud Foundry

I need to run java command from python flask application which is deployed using cf. How can we make java runtime available to this python flask app.
I tried using multi-buildpack, but java_buildpack expects some jar or war to be executed while deploying the application.
Any approach that would make java available to python flask app?
The last buildpack in the buildpack chain is responsible for determining a command to start your application which is why the Java buildpack expects a JAR/WAR to execute.
The Java buildpack, as of writing this, does not ship a supply script so it can only be run as the last buildpack when using multi buildpack support. It looks like that at some point in the future the Java buildpack will provide a supply script, but this is still being worked out here.
For now, what you can do is use the apt-buildpack and install a JRE/JDK that way.
To do this, add a file called apt.yml to the root of your project folder. In that file, put the following:
---
packages:
- openjdk-8-jre
repos:
- deb http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty main
keys:
- https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xEB9B1D8886F44E2A
This will tell the apt buildpack to add a PPA for Ubuntu Trusty where we can get the latest openjdk8. This gets installed under /home/vcap/deps/0, which puts the java executable at /home/vcap/deps/0/lib/jvm/java-8-openjdk-amd64/bin/java.
Note: The java binary is unfortunately not on the path because of the way Ubuntu uses update-alternatives and we can't use that tool to put it on the path in the CF app container because we don't have root access.
After setting that up, you'd follow the normal instructions for using multiple buildpacks.
$ cf push YOUR-APP --no-start -b binary_buildpack
$ cf v3-push YOUR-APP -b https://github.com/cloudfoundry/apt-buildpack#v0.1.1 -b python_buildpack
Note: The process to push using multiple buildpacks will likely change in the future and v3-push, which is currently experimental, will go away.
Note: The example above hard codes version v0.1.1 of the apt buildpack. You should use the latest stable release, which you can find here. Using the master branch is not recommended.
One way to achieve your goal to combine Java and Python would be with context-based routing. I have an example to combines Python and Node.js, but the approach is the same.
Basically, you have a second app serving one or more paths of a domain / URI.

Trouble with kafka example

I downloaded Apache Kafka and successfully completed the quickstart . When i tried running the example program provided, I got an error :
cannot load main class.
I dont know what I'm missing I even set the classpath.
Run this command if you are on Amazon EC2:
sudo yum install java-1.6.0-openjdk-devel
javac must be there. For that you need both JDK and JRE. You probably only have JRE installed.