New to Storm ? My previous post could help you in finding your feet. In this post, we'll be going the extra mile in an attempt to install Storm. This has got two aspects to it:
- Setting up Storm locally
- Setting up a Storm cluster
Let's begin with setting up the storm cluster locally, which hardly is a two step procedure.
Setting up Storm locally
This is kind of mandatory !!!
That's because even if your aim is to get topologies working on a cluster, submitting topologies to that cluster requires a 'storm client', which requires the storm to be setup on your system locally.
Moreover it is always better to dry run topologies on your local system before deploying them as a jar on the cluster. It saves you from the exhaustive debugging on the cluster. So moving forth, we'll be undertaking the following two tasks under this heading:
- Setting up Storm for running topologies on the local machine
- Setting up the Storm client
As an obvious prerequisite you must be working on Linux with Java 6 installed on it.
So steps for accomplishing the first task :
- Download a storm release fromhttps://github.com/nathanmarz/storm/downloads
- cd to the unzipped location of the storm setup to test if bin/storm is executable using any of these
- bin/storm ui
- bin/storm supervisor
- bin/storm nimbus
Next to get the ball rolling on running topologies in Storm, you can best start with the 'storm-starter' project using Eclipse. Steps for this are :
- Obtain the storm-starter project from the following location :
- Add the storm-0.5.*.jar and other required jars present in the storm setup to the build path of your eclipse project.
- If you want to start with the simplest thing that could possibly work, the simplest part of this project i.e. the 'WordCountTopology.java' could do the trick.
- Since this topology uses the 'SplitSentence' bolt which has been implemented using python, here's a java substitute for the 'SplitSentence' class if your preference is java.
public static class SplitSentence implements IRichBolt {
OutputCollector _collector;
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
_collector = collector;
}
public void execute(Tuple tuple) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
_collector.emit(tuple, new Values(word));
}
_collector.ack(tuple);
}
public void cleanup() {
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
}
|
Successfully accomplishing this leaves you have with a checked environment setup for testing and running any storm topology locally.
Setting up the Storm client
Communicating with a remote cluster and submitting topologies to it requires a Storm client on your system.
For this, configure the 'storm.yaml' file located in your storm setup's conf folder by adding the following line to it and place a copy of it at the location '~/.storm/storm.yaml'
nimbus.host: "ip_of_your_remote_cluster's_nimbus"
As an eg :
nimbus.host: "195.168.78.78”
As an important note also check the permissions of this file so that it is accessible.
Now you should be able to deploy jars on any remote cluster(steps to setup a remote cluster have been listed later in the post) using :
cd /path_to_your_storm_setup
bin/storm jar location_of_jar_on_your_system/WordCount.jar storm.starter.WordCountTopology
and kill running topologies using
bin/storm kill wordcount
Setting up a Storm Cluster
Time to kick off with setting up a Storm cluster. Here I am assuming a cluster of 3 machines, of which one would be the master node i.e. nimbus and the other two are the worker nodes.
Prerequisites :
- Java 6 and Python 2.6
- JAVA_HOME should be set, if it is not set in bashrc
These should be installed on all the machines of the cluster.
Installation steps :
- Setup the Zookeeper Cluster :
Zookeeper is the coordinator for a Storm cluster. The interaction between the nimbus and the worker nodes is done through the Zookeeper. So its compulsary to setup a Zookeeper cluster first. You can follow the instructions from here :
http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
- Install native dependencies
In the local mode, Storm uses a pure Java messaging system so that you don't need to install native dependencies on your development machine. But in case of a cluster, ZeroMQ and JZMQ are a prerequisite on all the nodes of the cluster including nimbus.
- Copy storm setup to all the machines in the cluster . Assuming the following IP for clarity : nimbus IP : A.B.C.Nimbus supervisor node Ips : A.B.C.Sup1 and A.B.C.Sup2 Edit the conf/storm.yaml file as follows:
Download and installation commands for ZeroMQ 2.1.7 :
|
Download and installation commands for JZMQ :
|
- Copy storm setup to all the machines in the cluster . Assuming the following IP for clarity : nimbus IP : A.B.C.Nimbus supervisor node Ips : A.B.C.Sup1 and A.B.C.Sup2 Edit the conf/storm.yaml file as follows:
“storm.yaml” file for master node/nimbus :
storm.zookeeper.servers:
- "A.B.C.Sup1" - "A.B.C.Sup2" storm.local.dir: "path_to_any_dir_for_temp_storage"
java.library.path: "/usr/local/lib/"
nimbus.host: "127.0.0.1"
nimbus.task.launch.secs: 240
supervisor.worker.start.timeout.secs: 240
supervisor.worker.timeout.secs: 240
|
“storm.yaml” file for all worker nodes :
storm.zookeeper.servers:
- "A.B.C.Sup1" - "A.B.C.Sup2"
storm.local.dir: "path_to_any_dir_for_temp_storage"
java.library.path: "/usr/local/lib/"
nimbus.host: "A.B.C.Nimbus"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
|
Note : Also copy this storm.yaml file to “~/.storm/” folder on the respective systems.
This completes the cluster setup and you can now submit topologies from your system to it after creating a jar. For further assistance in this follow :
That's all from my end . . . Hope it was helpful !!!