Showing posts with label Installation. Show all posts
Showing posts with label Installation. Show all posts

Friday, September 4, 2015

Setting up a Mesos-0.9.0 Cluster

Apart from running in Standalone mode, Spark can also run on clusters managed by Apache Mesos. "Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks." 

In addition to Spark, Hadoop, MPI and Hypertable also support running on clusters managed by Apache Mesos. 

Listed below are the steps for deploying a Mesos Cluster. These should to be run on the node supposed to be the Master node in the Mesos Cluster.

1. Mesos 0.9.0-incubating can be downloaded from:
http://archive.apache.org/dist/incubator/mesos/mesos-0.9.0-incubating/                                                                                                   

2. Extract Mesos setup
tar -xvzf  mesos-0.9.0-incubating.tar.gz                                                                                                                                                                                                    

3. Change the current working directory to the extracted mesos setup for it's compilation. 
cd mesos-0.9.0                                                                                                                                                                                                                                                         

The JAVA_HOME to be used needs to specified, while configuring Mesos. This can be done by specifying a command line option "--with-java-home" to the configure command as shown below:
./configure --with-java-home=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64                                                                                                  

After configuring, run the following two commands:
make 
sudo make install                                                                                                                                                                                        

The following files and directories should have been created, if the above steps have been executed successfully:
/usr/local/lib/libmesos.so 
/usr/local/sbin/mesos-daemon.sh  
/usr/local/sbin/mesos-slave             
/usr/local/sbin/mesos-start-masters.sh  
/usr/local/sbin/mesos-stop-cluster.sh 
/usr/local/sbin/mesos-stop-slaves.sh
/usr/local/sbin/mesos-master     
/usr/local/sbin/mesos-start-cluster.sh  
/usr/local/sbin/mesos-start-slaves.sh   
/usr/local/sbin/mesos-stop-masters.sh
/usr/local/var/mesos/conf/mesos.conf.template
/usr/local/var/mesos/deploy                                                                                                                                                                                                    

4. Add the MESOS_NATIVE_LIBRARY variable declaration to "conf/spark-env.sh" in Spark's "conf" directory as shown below:
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so                                                                                                                                           

5. Copy the Mesos setup on all the nodes to be included in the Mesos cluster on the same location or simply setup Mesos on each of them by running:
cd mesos-0.9.0
sudo make install                                                                                                                                                                                                                                           

This completes the process of setting up a Mesos Cluster.

Configure Mesos for deployment


1. On the Mesos Cluster's master node, edit the files "/usr/local/var/mesos/deploy/masters" to list down the IP of the Master node and "/usr/local/var/mesos/deploy/slaves" to list down the IPs of the slaves.

2. On all nodes of the Mesos Cluster, edit "/usr/local/var/mesos/conf/mesos.conf" and add the line master=HOST:5050, where HOST is the IP of the Mesos Cluster's master node.

This is the end of the Configuration Phase.

Good luck.

Setting up Spark-0.7.x in Standalone Mode


A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require any external resource manager such as Mesos.



To deploy a Spark Cluster in Standalone mode, the following steps need to be executed on any one of the nodes.

1. Download the spark-0.7.x setup from: 
http://spark.apache.org/downloads.html

2. Extract the Spark setup
tar -xzvf spark-0.7.x-sources.tgz

3. Spark requires Scala's bin directory to be present in the PATH variable of the linux machine. Scala 2.9.3 for Linux can be downloaded from:
http://www.scala-lang.org/downloads

4. Extract the Scala setup  
tar -xzvf scala-2.9.3.tgz

5. Export the Scala home by appending the following line into "~/.bashrc" (for CentOS) or "/etc/environment" (for Ubuntu)
export SCALA_HOME=/location_of_extracted_scala_setup/scala-2.9.3

6. Spark can be compiled "sbt" or can be built using Maven. This module states the former method, because of it's simplicity of execution. To compile change directory to the extracted Spark setup and execute the following command:
sbt/sbt package

7. Create a file (if not already present) called "spark-env.sh" in Spark’s "conf" directory, by copying "conf/spark-env.sh.template", and add the SCALA_HOME variable declaration to it as described below:
export SCALA_HOME=<path to Scala directory>

The Web UI port for the Spark Master and Worker can also be optionally specified by appending the following to "spark-env.sh"
export SPARK_MASTER_WEBUI_PORT=8083
export SPARK_WORKER_WEBUI_PORT=8084

8. To specify the nodes which would behave as the Workers, the IP of the nodes are to mentioned in "conf/slaves". For a cluster containing two worker nodes with IP 192.10.0.1 and 192.10.0.2, the "conf/slaves" would contain:
192.10.0.1
192.10.0.2

This completes the setup process on one node. 

For setting up Spark on the other nodes of the cluster, the Spark and Scala Setup should be copied on same locations on the rest of the nodes of the cluster.

Lastly, edit the /etc/hosts file on all the nodes to add the "IP HostName" entries of all the other nodes in the cluster.

Hope that helps !!

Thursday, July 16, 2015

Installing Hadoop-1.x.x in Pseudo-Distributed Mode

Disclaimer: The installation steps shared in this blog post are typically for the hadoop-1.x.x series. If you are looking for hadoop-2.x.x series installation steps i.e. with YARN, this post isn’t the right place.

Hadoop installation can be done in the following three modes. This post elaborates the Pseudo-Distributed Mode.
  • Standalone Mode
In this mode Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging. This is the default mode for Hadoop.
  • Pseudo-Distributed Mode
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. Such a mode is called Pseudo-Distributed mode.
  • Fully-Distributed Mode
In this mode we install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

Supported Platforms


GNU/Linux is supported as a development and production platform.
Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

Required Software


Required software for Linux and Windows include:
  • JavaTM 1.6.x, preferably from Sun, must be installed.
  • ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
  • Additional requirements for Windows includes Cygwin. Its required for shell support in addition to the required software above.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

          For example on Ubuntu Linux:


$ sudo apt-get install ssh
$ sudo apt-get install rsync                      

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:
          openssh - the Net category

Download Hadoop


Obtain a Hadoop-1.x.x stable release from http://hadoop.apache.org/releases.html
  • Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
  • Try the following command:

$ bin/hadoop                                                      

        This will display the usage documentation for the hadoop script.
  • Now you are ready to start your Hadoop cluster in one of the three supported modes:
  1. Local (Standalone) Mode
  2. Pseudo-Distributed Mode
  3. Fully-Distributed Mode
For the Pseudo-Distributed Mode we need to configure 3 files namely: 


  • conf/core-site.xml
<configuration>  
      <property>     
            <name>fs.default.name</name>
            <value>hdfs://localhost:9000</value>                                                                     
      </property>
 </configuration>
  • conf/hdfs-site.xml
<configuration>
      <property>
            <name>dfs.replication</name>    
             <value>1</value>   
      </property>
</configuration>
  • conf/mapred-site.xml
    <configuration>   
         <property>     
              <name>mapred.job.tracker</name>     
              <value>localhost:9001</value>   
          </property>
    </configuration>


Setup Passphraseless SSH


          We need to ssh to the localhost without a passphrase:
             $ ssh localhost 
          If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
(This command generates a public key on the system and stores it on ~/.ssh/id_dsa)
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
(This command copies the public key to the set of authorized keys of the system)
This completes the ssh passphraseless setup process.


Start n Stop the Cluster

  • Format a new distributed-file system:
$ bin/hadoop namenode -format
  • Start the hadoop daemons:
$ bin/start-all.sh
  •  Accessible UI
NameNode UI Port - 50070
JobTracker UI Port – 50030
  • Stop the hadoop daemons:
$bin/stop-all.sh


Friday, June 26, 2015

Installation Script for Apache Storm on Ubuntu

One of my blogs here, describes steps for manual installation of a Storm cluster. To intensify the convenience factor for you, here's an installation script that you can use for setting up a Storm cluster on Linux machines. Although the script should work for older versions of Apache Storm, it has been tested for storm-0.9.0-wip4. The script has embedded descriptive messages for each input it expects from you. The installation would be done in the '/opt' folder of the machines in a sub-directory of your choice. Make sure the user installing the cluster has admin rights on the /opt folder. The script also takes care of installing all the required dependencies. To use the script for versions other than the supported one, you need to make changes to the script and replace the "storm-0.9.0-wip4" occurrence with your storm version.


#!/bin/bash
# Local FS Setups location
echo "Enter the location of the setups folder. For example '/home/abc/storminstallation/setups'"
read -e setupsLocation
# Directory Name
echo "Enter the directory name"
read -e realTimePlatformDir
rtpLocalDir="\/opt\/$realTimePlatformDir\/storm\/storm_temp"
rtpLocalDirMake=/opt/$realTimePlatformDir/storm/storm_temp
echo $rtpLocalDir;
echo "Enter the IP of nimbus machine :"
read -e stormNimbus;
array[0]=$stormNimbus


# Read supervisor
echo "Enter the number of supervisor machines";
read -e n;
for ((  i = 1 ;  i <= n;  i++  ))
do
echo "Enter the IP of storm supervisor machine $i:"
read -e stormSupervisor;
array[i]=$stormSupervisor
done

# Read zookeeper
echo "Enter the number of machines in the zookeeper cluster";
read -e m;
for ((  i = 1 ;  i <= m;  i++  ))
do
echo "Enter the IP of zookeeper machine $i:"
read -e zkServer;
zkEntry="- \""$zkServer"\""
zKArray=$zKArray","$zkEntry
done

# Copy the required setups to all the storm machines
for ((  i = 1 ;  i <= n+1;  i++  ))
do
echo "Enter the username of machine ${array[i-1]}"
read -e username
echo "Username:"$username
if [ $username == 'root' ]; then
echo 'root';
yamlFilePath="/root/.storm";
else
echo $username;
yamlFilePath="/home/$username/.storm";
fi
echo "the storm.yaml file would be formed at : $yamlFilePath";
echo "Enter the value for JAVA_HOME to be set on the machine ${array[i-1]}"
read -e javaHome;
echo 'JAVA_HOME would be set to :'$javaHome;
ssh -t $username@${array[i-1]} "if [ ! -d /opt/$realTimePlatformDir ]; then
           sudo mkdir /opt/$realTimePlatformDir;
           sudo chown -R $username: /opt/$realTimePlatformDir;
           mkdir /opt/$realTimePlatformDir/storm;
           mkdir $rtpLocalDirMake;
           mkdir $yamlFilePath;
        fi"
     
scp -r -q $setupsLocation/storm-0.9.0-wip4 $username@${array[i-1]}:/opt/$realTimePlatformDir/storm/storm-0.9.0-wip4
     
ssh -t $username@${array[i-1]} "sed -i 's/ZOOKEEPER_IPS/$zKArray/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;
sed -i 's/,/\n/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;
sed -i 's/NIMBUS_IP/$stormNimbus/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;
sed -i 's/LOCAL_DIR/$rtpLocalDir/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;
cp /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml $yamlFilePath;
sudo apt-get install git;
sudo apt-get install uuid-dev;"

ssh -t $username@${array[i-1]} "cd /opt/$realTimePlatformDir/storm;
                        wget http://download.zeromq.org/zeromq-2.1.7.tar.gz;
                        tar -xzf zeromq-2.1.7.tar.gz
                        cd zeromq-2.1.7                    
                        ./configure
                        make
                        sudo make install

                        cd ..
                        export JAVA_HOME=$javaHome;
                        echo $JAVA_HOME;
                        git clone https://github.com/nathanmarz/jzmq.git
                        cd jzmq
                        ./autogen.sh
                        ./configure
                        make
                        sudo make install"

done


Yep Done! Hope it helped. My next post shares the installation script for CentOS and the next to it a small start up script for the installed Storm cluster.

Installation Script for Apache Zookeeper-3.3.5 on Linux

One of my previous blogs describes how to setup a Zookeeper cluster manually. Here's a quick fix: an installation script for the same. You need to run the following script (after storing the content in a .sh file) on your machine and you can install a zookeeper cluster on a set of remote machines. All you need as prerequisite on those machines is the zookeeper-3.3.5 setup at a common location. You can also use this script for other versions, with a bit of modification to the script(replacing the version used in the script to yours. It should have not been hard coded, I know.. My bad).



#!/bin/bash

zkServerEntryPart1="server.";
zkServerEntryPart2="=zoo";
zkServerEntryPart3=":2888:3888NEW_LINE";
zkServerEntry="";

# Local FS Setup location
echo "Enter the path of the folder in which the zookeeper setup is stored. For example '/home/abc/setups'"
read -e setupsLocation
# Directory Name
echo "Enter the directory path where zookeeper is to be installed : "
read -e zookeeperSetupLocation

# Read the number of zookeeper servers
echo "Enter the number of machines in the zookeeper cluster : ";
read -e n;

# Read the zookeeper server details
for ((  i = 1 ;  i <= n;  i++  ))
do
# obtain the zookeeper server ips of all machines in the cluster
echo "Enter the IP of zookeeper machine $i:"
read -e zookeeperServer;
zookeeperServerIPList[i]=$zookeeperServer
temp=$zkServerEntryPart1""$i""$zkServerEntryPart2""$i""$zkServerEntryPart3;
zkServerEntry=$zkServerEntry""$temp;

# obtain the usernames for all the zookeeper servers
echo "Enter the username of machine ${zookeeperServerIPList[i]}"
read -e username
userNameList[i]=$username
done

# Copy the setup to all the storm machines
for ((  i = 1 ;  i <= n;  i++  ))
do
echo "Enter the data directory location of zookeeper for the machine ${zookeeperServerIPList[i]}"
read -e dataDir
     
# create the required folders on the machines
ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "if [ ! -d $zookeeperSetupLocation ]; then
sudo mkdir $zookeeperSetupLocation;
sudo chown -R ${userNameList[i]}: $zookeeperSetupLocation;
fi
if [ ! -d $dataDir ]; then
sudo mkdir $dataDir;
sudo chown -R ${userNameList[i]}: $dataDir;
fi"

# copy the zookeeper setup at the specified location on the machines
scp -r -q $setupsLocation/zookeeper-3.3.5 ${userNameList[i]}@${zookeeperServerIPList[i]}:$zookeeperSetupLocation/zookeeper-3.3.5

# create and configure the 'zoo.cfg' and 'myid' files
ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "touch $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
echo -e "dataDir=$dataDir" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
echo -e "syncLimit=2" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
echo -e "initLimit=5" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
echo -e "clientPort=2181" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
echo  "$zkServerEntry" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;
sed -i 's/NEW_LINE/\n/g' $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg
touch $dataDir/myid;                           
echo -e $i >> $dataDir/myid;"

# update the /etc/hosts file
hostFileEntry=${zookeeperServerIPList[i]}" zoo"$i;
for ((  j = 1 ;  j <= n;  j++  ))
do
ssh -t ${userNameList[j]}@${zookeeperServerIPList[j]} "sudo cp /etc/hosts /etc/hosts.bak;
sudo cp /etc/hosts /etc/hosts1;
sudo chmod 777 /etc/hosts1;
sudo echo -e "$hostFileEntry" >> /etc/hosts1;
sudo mv /etc/hosts1 /etc/hosts;"
done
done

Hope it helped. My next blog post is about a small script to start-up the installed zookeeper cluster.