Jayati Tiwari: Installation

Showing posts with label Installation. Show all posts

Friday, September 4, 2015

Setting up a Mesos-0.9.0 Cluster

Apart from running in Standalone mode, Spark can also run on clusters managed by Apache Mesos. "Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks."

In addition to Spark, Hadoop, MPI and Hypertable also support running on clusters managed by Apache Mesos.

Listed below are the steps for deploying a Mesos Cluster. These should to be run on the node supposed to be the Master node in the Mesos Cluster.

1. Mesos 0.9.0-incubating can be downloaded from:

http://archive.apache.org/dist/incubator/mesos/mesos-0.9.0-incubating/

2. Extract Mesos setup

tar -xvzf mesos-0.9.0-incubating.tar.gz

3. Change the current working directory to the extracted mesos setup for it's compilation.

cd mesos-0.9.0

The JAVA_HOME to be used needs to specified, while configuring Mesos. This can be done by specifying a command line option "--with-java-home" to the configure command as shown below:

./configure --with-java-home=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64

After configuring, run the following two commands:

make
sudo make install

The following files and directories should have been created, if the above steps have been executed successfully:

/usr/local/lib/libmesos.so
/usr/local/sbin/mesos-daemon.sh
/usr/local/sbin/mesos-slave
/usr/local/sbin/mesos-start-masters.sh
/usr/local/sbin/mesos-stop-cluster.sh
/usr/local/sbin/mesos-stop-slaves.sh
/usr/local/sbin/mesos-master
/usr/local/sbin/mesos-start-cluster.sh
/usr/local/sbin/mesos-start-slaves.sh
/usr/local/sbin/mesos-stop-masters.sh
/usr/local/var/mesos/conf/mesos.conf.template
/usr/local/var/mesos/deploy

4. Add the MESOS_NATIVE_LIBRARY variable declaration to "conf/spark-env.sh" in Spark's "conf" directory as shown below:

export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so

5. Copy the Mesos setup on all the nodes to be included in the Mesos cluster on the same location or simply setup Mesos on each of them by running:

cd mesos-0.9.0
sudo make install

This completes the process of setting up a Mesos Cluster.

Configure Mesos for deployment

1. On the Mesos Cluster's master node, edit the files "/usr/local/var/mesos/deploy/masters" to list down the IP of the Master node and "/usr/local/var/mesos/deploy/slaves" to list down the IPs of the slaves.

2. On all nodes of the Mesos Cluster, edit "/usr/local/var/mesos/conf/mesos.conf" and add the line master=HOST:5050, where HOST is the IP of the Mesos Cluster's master node.

This is the end of the Configuration Phase.

Good luck.

Setting up Spark-0.7.x in Standalone Mode

A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require any external resource manager such as Mesos.

To deploy a Spark Cluster in Standalone mode, the following steps need to be executed on any one of the nodes.

1. Download the spark-0.7.x setup from:
http://spark.apache.org/downloads.html

2. Extract the Spark setup
tar -xzvf spark-0.7.x-sources.tgz

3. Spark requires Scala's bin directory to be present in the PATH variable of the linux machine. Scala 2.9.3 for Linux can be downloaded from:
http://www.scala-lang.org/downloads

4. Extract the Scala setup
tar -xzvf scala-2.9.3.tgz

5. Export the Scala home by appending the following line into "~/.bashrc" (for CentOS) or "/etc/environment" (for Ubuntu)
export SCALA_HOME=/location_of_extracted_scala_setup/scala-2.9.3

6. Spark can be compiled "sbt" or can be built using Maven. This module states the former method, because of it's simplicity of execution. To compile change directory to the extracted Spark setup and execute the following command:
sbt/sbt package

7. Create a file (if not already present) called "spark-env.sh" in Sparkâ€™s "conf" directory, by copying "conf/spark-env.sh.template", and add the SCALA_HOME variable declaration to it as described below:
export SCALA_HOME=<path to Scala directory>

The Web UI port for the Spark Master and Worker can also be optionally specified by appending the following to "spark-env.sh"
export SPARK_MASTER_WEBUI_PORT=8083
export SPARK_WORKER_WEBUI_PORT=8084

8. To specify the nodes which would behave as the Workers, the IP of the nodes are to mentioned in "conf/slaves". For a cluster containing two worker nodes with IP 192.10.0.1 and 192.10.0.2, the "conf/slaves" would contain:
192.10.0.1
192.10.0.2

This completes the setup process on one node.

For setting up Spark on the other nodes of the cluster, the Spark and Scala Setup should be copied on same locations on the rest of the nodes of the cluster.

Lastly, edit the /etc/hosts file on all the nodes to add the "IP HostName" entries of all the other nodes in the cluster.

Hope that helps !!

Thursday, July 16, 2015

Installing Hadoop-1.x.x in Pseudo-Distributed Mode

Disclaimer: The installation steps shared in this blog post are typically for the hadoop-1.x.x series. If you are looking for hadoop-2.x.x series installation steps i.e. with YARN, this post isn’t the right place.

Hadoop installation can be done in the following three modes. This post elaborates the Pseudo-Distributed Mode.

Standalone Mode

In this mode Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. This is the default mode for Hadoop.

Pseudo-Distributed Mode

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. Such a mode is called Pseudo-Distributed mode.

Fully-Distributed Mode

In this mode we install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

Supported Platforms

GNU/Linux is supported as a development and production platform.

Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

Required Software

Required software for Linux and Windows include:

Java^TM 1.6.x, preferably from Sun, must be installed.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows includes Cygwin. Its required for shell support in addition to the required software above.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

• For example on Ubuntu Linux:

$ sudo apt-get install ssh

$ sudo apt-get install rsync

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

• openssh - the Net category

Download Hadoop

Obtain a Hadoop-1.x.x stable release from http://hadoop.apache.org/releases.html

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Try the following command:

$ bin/hadoop

This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

Local (Standalone) Mode
Pseudo-Distributed Mode
Fully-Distributed Mode

For the Pseudo-Distributed Mode we need to configure 3 files namely:

conf/core-site.xml

<name>fs.default.name</name>

<value>hdfs://localhost:9000</value>

</property>

</configuration>

conf/hdfs-site.xml

<name>dfs.replication</name>

</property>

</configuration>

conf/mapred-site.xml

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

</property>

</configuration>

Setup Passphraseless SSH

• We need to ssh to the localhost without a passphrase:
$ ssh localhost

• If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

(This command generates a public key on the system and stores it on ~/.ssh/id_dsa)

$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

(This command copies the public key to the set of authorized keys of the system)

This completes the ssh passphraseless setup process.

Start n Stop the Cluster

Format a new distributed-file system:

$ bin/hadoop namenode -format

Start the hadoop daemons:

$ bin/start-all.sh

Accessible UI

NameNode UI Port - 50070

JobTracker UI Port – 50030

Stop the hadoop daemons:

$bin/stop-all.sh

Friday, June 26, 2015

Installation Script for Apache Storm on Ubuntu

One of my blogs here, describes steps for manual installation of a Storm cluster. To intensify the convenience factor for you, here's an installation script that you can use for setting up a Storm cluster on Linux machines. Although the script should work for older versions of Apache Storm, it has been tested for storm-0.9.0-wip4. The script has embedded descriptive messages for each input it expects from you. The installation would be done in the '/opt' folder of the machines in a sub-directory of your choice. Make sure the user installing the cluster has admin rights on the /opt folder. The script also takes care of installing all the required dependencies. To use the script for versions other than the supported one, you need to make changes to the script and replace the "storm-0.9.0-wip4" occurrence with your storm version.

#!/bin/bash

# Local FS Setups location

echo "Enter the location of the setups folder. For example '/home/abc/storminstallation/setups'"

read -e setupsLocation

# Directory Name

echo "Enter the directory name"

read -e realTimePlatformDir

rtpLocalDir="\/opt\/$realTimePlatformDir\/storm\/storm_temp"

rtpLocalDirMake=/opt/$realTimePlatformDir/storm/storm_temp

echo $rtpLocalDir;

echo "Enter the IP of nimbus machine :"

read -e stormNimbus;

array[0]=$stormNimbus

# Read supervisor

echo "Enter the number of supervisor machines";

read -e n;

for (( i = 1 ; i <= n; i++ ))

echo "Enter the IP of storm supervisor machine $i:"

read -e stormSupervisor;

array[i]=$stormSupervisor

done

# Read zookeeper

echo "Enter the number of machines in the zookeeper cluster";

read -e m;

for (( i = 1 ; i <= m; i++ ))

echo "Enter the IP of zookeeper machine $i:"

read -e zkServer;

zkEntry="- \""$zkServer"\""

zKArray=$zKArray","$zkEntry

done

# Copy the required setups to all the storm machines

for (( i = 1 ; i <= n+1; i++ ))

echo "Enter the username of machine ${array[i-1]}"

read -e username

echo "Username:"$username

if [ $username == 'root' ]; then

echo 'root';

yamlFilePath="/root/.storm";

else

echo $username;

yamlFilePath="/home/$username/.storm";

echo "the storm.yaml file would be formed at : $yamlFilePath";

echo "Enter the value for JAVA_HOME to be set on the machine ${array[i-1]}"

read -e javaHome;

echo 'JAVA_HOME would be set to :'$javaHome;

ssh -t $username@${array[i-1]} "if [ ! -d /opt/$realTimePlatformDir ]; then

sudo mkdir /opt/$realTimePlatformDir;

sudo chown -R $username: /opt/$realTimePlatformDir;

mkdir /opt/$realTimePlatformDir/storm;

mkdir $rtpLocalDirMake;

mkdir $yamlFilePath;

fi"

scp -r -q $setupsLocation/storm-0.9.0-wip4 $username@${array[i-1]}:/opt/$realTimePlatformDir/storm/storm-0.9.0-wip4

ssh -t $username@${array[i-1]} "sed -i 's/ZOOKEEPER_IPS/$zKArray/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/,/\n/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/NIMBUS_IP/$stormNimbus/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/LOCAL_DIR/$rtpLocalDir/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

cp /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml $yamlFilePath;

sudo apt-get install git;

sudo apt-get install uuid-dev;"

ssh -t $username@${array[i-1]} "cd /opt/$realTimePlatformDir/storm;

wget http://download.zeromq.org/zeromq-2.1.7.tar.gz;

tar -xzf zeromq-2.1.7.tar.gz

cd zeromq-2.1.7

./configure

make

sudo make install

cd ..

export JAVA_HOME=$javaHome;

echo $JAVA_HOME;

git clone https://github.com/nathanmarz/jzmq.git

cd jzmq

./autogen.sh

./configure

make

sudo make install"

done

Yep Done! Hope it helped. My next post shares the installation script for CentOS and the next to it a small start up script for the installed Storm cluster.

Installation Script for Apache Zookeeper-3.3.5 on Linux

One of my previous blogs describes how to setup a Zookeeper cluster manually. Here's a quick fix: an installation script for the same. You need to run the following script (after storing the content in a .sh file) on your machine and you can install a zookeeper cluster on a set of remote machines. All you need as prerequisite on those machines is the zookeeper-3.3.5 setup at a common location. You can also use this script for other versions, with a bit of modification to the script(replacing the version used in the script to yours. It should have not been hard coded, I know.. My bad).

#!/bin/bash

zkServerEntryPart1="server.";

zkServerEntryPart2="=zoo";

zkServerEntryPart3=":2888:3888NEW_LINE";

zkServerEntry="";

# Local FS Setup location

echo "Enter the path of the folder in which the zookeeper setup is stored. For example '/home/abc/setups'"

read -e setupsLocation

# Directory Name

echo "Enter the directory path where zookeeper is to be installed : "

read -e zookeeperSetupLocation

# Read the number of zookeeper servers

echo "Enter the number of machines in the zookeeper cluster : ";

read -e n;

# Read the zookeeper server details

for (( i = 1 ; i <= n; i++ ))

# obtain the zookeeper server ips of all machines in the cluster

echo "Enter the IP of zookeeper machine $i:"

read -e zookeeperServer;

zookeeperServerIPList[i]=$zookeeperServer

temp=$zkServerEntryPart1""$i""$zkServerEntryPart2""$i""$zkServerEntryPart3;

zkServerEntry=$zkServerEntry""$temp;

# obtain the usernames for all the zookeeper servers

echo "Enter the username of machine ${zookeeperServerIPList[i]}"

read -e username

userNameList[i]=$username

done

# Copy the setup to all the storm machines

for (( i = 1 ; i <= n; i++ ))

echo "Enter the data directory location of zookeeper for the machine ${zookeeperServerIPList[i]}"

read -e dataDir

# create the required folders on the machines

ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "if [ ! -d $zookeeperSetupLocation ]; then

sudo mkdir $zookeeperSetupLocation;

sudo chown -R ${userNameList[i]}: $zookeeperSetupLocation;

if [ ! -d $dataDir ]; then

sudo mkdir $dataDir;

sudo chown -R ${userNameList[i]}: $dataDir;

fi"

# copy the zookeeper setup at the specified location on the machines

scp -r -q $setupsLocation/zookeeper-3.3.5 ${userNameList[i]}@${zookeeperServerIPList[i]}:$zookeeperSetupLocation/zookeeper-3.3.5

# create and configure the 'zoo.cfg' and 'myid' files

ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "touch $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "dataDir=$dataDir" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "syncLimit=2" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "initLimit=5" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "clientPort=2181" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo "$zkServerEntry" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

sed -i 's/NEW_LINE/\n/g' $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg

touch $dataDir/myid;

echo -e $i >> $dataDir/myid;"

# update the /etc/hosts file

hostFileEntry=${zookeeperServerIPList[i]}" zoo"$i;

for (( j = 1 ; j <= n; j++ ))

ssh -t ${userNameList[j]}@${zookeeperServerIPList[j]} "sudo cp /etc/hosts /etc/hosts.bak;

sudo cp /etc/hosts /etc/hosts1;

sudo chmod 777 /etc/hosts1;

sudo echo -e "$hostFileEntry" >> /etc/hosts1;

sudo mv /etc/hosts1 /etc/hosts;"

done

Hope it helped. My next blog post is about a small script to start-up the installed zookeeper cluster.