Jayati Tiwari: June 2015

Friday, June 26, 2015

Start-up script for an installed Apache Storm Cluster

If you have installed a Storm cluster using my shell scripts in the previous blogs or even otherwise, this script will save you from manually visiting each node and starting the appropriate service(nimbus/supervisor/ui) there. All you have to do is grab a remote machine and run the script. The script will ask for the required information and your cluster would be up. Also, the script should work equally fine for both Ubuntu and CentOS.

#!/bin/bash

# Directory Name

echo "Enter the relative path to the storm setup on the machines (For example, /opt/stormSetup/storm-0.9.0-wip4):"

read -e stormDir

# Read usernames and ips of all the storm cluster nodes

echo "Enter the IP of nimbus machine :"

read -e stormNimbus;

clusterMachineIPs[0]=$stormNimbus

echo "Enter the username of nimbus :"

read -e usernameNimbus

clusterMachineUsernames[0]=$usernameNimbus

# Read supervisor

echo "Enter the number of supervisor machines";

read -e n;

for (( i = 1 ; i <= n; i++ ))

echo "Enter the IP of storm supervisor machine $i:"

read -e stormSupervisor;

clusterMachineIPs[i]=$stormSupervisor

echo "Enter the username of machine ${clusterMachineIPs[i]}"

read -e username

clusterMachineUsernames[i]=$username

done

sshpass -p root ssh -o StrictHostKeyChecking=no $usernameNimbus@$stormNimbus $stormDir/bin/storm nimbus&

# Start the supervisor nodes

for (( i = 1 ; i <= n; i++ ))

sshpass -p root ssh -o StrictHostKeyChecking=no ${clusterMachineUsernames[i]}@${clusterMachineIPs[i]} $stormDir/bin/storm supervisor&

done

# Start the UI on the nimbus machine

sshpass -p root ssh -o StrictHostKeyChecking=no $usernameNimbus@$stormNimbus $stormDir/bin/storm ui&

Visit your UI on the browser after a few minutes. Hope it shows up fine. Cheers!

Installation Script for Apache Storm on CentOS

CentOS and Ubuntu and two famous Linux distribution used pretty widely. My last post shares an installation script for Storm cluster over Ubuntu machines and this one is for CentOS. The few usage rules are just the same as for Ubuntu. I'll recite here. Although the script should work for older versions of Apache Storm, it has been tested for storm-0.9.0-wip7. The script has embedded descriptive messages for each input it expects from you. The installation would be done in the '/opt' folder of the machines in a sub-directory of your choice. Make sure the user installing the cluster has admin rights on the /opt folder. The script also takes care of installing all the required dependencies. To use the script for versions other than the supported one, you need to make changes to the script and replace the "storm-0.9.0-wip7" occurrence with your storm version.

#!/bin/bash

# Local FS Setups location

echo "Enter the location of the setups folder. For example '/home/abc/storminstallation/setups'"

read -e setupsLocation

# Directory Name

echo "Enter the directory name"

read -e realTimePlatformDir

rtpLocalDir="\/opt\/$realTimePlatformDir\/storm\/storm_temp"

rtpLocalDirMake=/opt/$realTimePlatformDir/storm/storm_temp

echo $rtpLocalDir;

echo "Enter the IP of nimbus machine :"

read -e stormNimbus;

array[0]=$stormNimbus

# Read supervisor

echo "Enter the number of supervisor machines";

read -e n;

for (( i = 1 ; i <= n; i++ ))

echo "Enter the IP of storm supervisor machine $i:"

read -e stormSupervisor;

array[i]=$stormSupervisor

done

# Read zookeeper

echo "Enter the number of machines in the zookeeper cluster";

read -e m;

for (( i = 1 ; i <= m; i++ ))

echo "Enter the IP of zookeeper machine $i:"

read -e zkServer;

zkEntry="- \""$zkServer"\""

zKArray=$zKArray","$zkEntry

done

# Copy the required setups to all the storm machines

for (( i = 1 ; i <= n+1; i++ ))

echo "Enter the username of machine ${array[i-1]}"

read -e username

echo "Username:"$username

if [ $username == 'root' ]; then

echo 'root';

yamlFilePath="/root/.storm";

else

echo $username;

yamlFilePath="/home/$username/.storm";

echo "the storm.yaml file would be formed at : $yamlFilePath";

echo "Enter the value for JAVA_HOME to be set on the machine ${array[i-1]}"

read -e javaHome;

echo 'JAVA_HOME would be set to :'$javaHome;

ssh -t $username@${array[i-1]} "if [ ! -d /opt/$realTimePlatformDir ]; then

sudo mkdir /opt/$realTimePlatformDir;

sudo chown -R $username: /opt/$realTimePlatformDir;

mkdir /opt/$realTimePlatformDir/storm;

mkdir $rtpLocalDirMake;

mkdir $yamlFilePath;

fi"

scp -r -q $setupsLocation/storm-0.9.0-wip7 $username@${array[i-1]}:/opt/$realTimePlatformDir/storm/storm-0.9.0-wip7

ssh -t $username@${array[i-1]} "sed -i 's/ZOOKEEPER_IPS/$zKArray/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip7/conf/storm.yaml;

sed -i 's/,/\n/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip7/conf/storm.yaml;

sed -i 's/NIMBUS_IP/$stormNimbus/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip7/conf/storm.yaml;

sed -i 's/LOCAL_DIR/$rtpLocalDir/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip7/conf/storm.yaml;

cp /opt/$realTimePlatformDir/storm/storm-0.9.0-wip7/conf/storm.yaml $yamlFilePath;

sudo yum install git;

sudo yum install libuuid-devel;"

ssh -t $username@${array[i-1]} "cd /opt/$realTimePlatformDir/storm;

wget http://download.zeromq.org/zeromq-2.1.7.tar.gz;

tar -xzf zeromq-2.1.7.tar.gz

cd zeromq-2.1.7

./configure

make

sudo make install

cd ..

export JAVA_HOME=$javaHome;

echo $JAVA_HOME;

git clone https://github.com/nathanmarz/jzmq.git

cd jzmq

./autogen.sh

./configure

make

sudo make install"

done

Hope this helps. My next post shares small start-up script for the installed Storm cluster.

Installation Script for Apache Storm on Ubuntu

One of my blogs here, describes steps for manual installation of a Storm cluster. To intensify the convenience factor for you, here's an installation script that you can use for setting up a Storm cluster on Linux machines. Although the script should work for older versions of Apache Storm, it has been tested for storm-0.9.0-wip4. The script has embedded descriptive messages for each input it expects from you. The installation would be done in the '/opt' folder of the machines in a sub-directory of your choice. Make sure the user installing the cluster has admin rights on the /opt folder. The script also takes care of installing all the required dependencies. To use the script for versions other than the supported one, you need to make changes to the script and replace the "storm-0.9.0-wip4" occurrence with your storm version.

#!/bin/bash

# Local FS Setups location

echo "Enter the location of the setups folder. For example '/home/abc/storminstallation/setups'"

read -e setupsLocation

# Directory Name

echo "Enter the directory name"

read -e realTimePlatformDir

rtpLocalDir="\/opt\/$realTimePlatformDir\/storm\/storm_temp"

rtpLocalDirMake=/opt/$realTimePlatformDir/storm/storm_temp

echo $rtpLocalDir;

echo "Enter the IP of nimbus machine :"

read -e stormNimbus;

array[0]=$stormNimbus

# Read supervisor

echo "Enter the number of supervisor machines";

read -e n;

for (( i = 1 ; i <= n; i++ ))

echo "Enter the IP of storm supervisor machine $i:"

read -e stormSupervisor;

array[i]=$stormSupervisor

done

# Read zookeeper

echo "Enter the number of machines in the zookeeper cluster";

read -e m;

for (( i = 1 ; i <= m; i++ ))

echo "Enter the IP of zookeeper machine $i:"

read -e zkServer;

zkEntry="- \""$zkServer"\""

zKArray=$zKArray","$zkEntry

done

# Copy the required setups to all the storm machines

for (( i = 1 ; i <= n+1; i++ ))

echo "Enter the username of machine ${array[i-1]}"

read -e username

echo "Username:"$username

if [ $username == 'root' ]; then

echo 'root';

yamlFilePath="/root/.storm";

else

echo $username;

yamlFilePath="/home/$username/.storm";

echo "the storm.yaml file would be formed at : $yamlFilePath";

echo "Enter the value for JAVA_HOME to be set on the machine ${array[i-1]}"

read -e javaHome;

echo 'JAVA_HOME would be set to :'$javaHome;

ssh -t $username@${array[i-1]} "if [ ! -d /opt/$realTimePlatformDir ]; then

sudo mkdir /opt/$realTimePlatformDir;

sudo chown -R $username: /opt/$realTimePlatformDir;

mkdir /opt/$realTimePlatformDir/storm;

mkdir $rtpLocalDirMake;

mkdir $yamlFilePath;

fi"

scp -r -q $setupsLocation/storm-0.9.0-wip4 $username@${array[i-1]}:/opt/$realTimePlatformDir/storm/storm-0.9.0-wip4

ssh -t $username@${array[i-1]} "sed -i 's/ZOOKEEPER_IPS/$zKArray/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/,/\n/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/NIMBUS_IP/$stormNimbus/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

sed -i 's/LOCAL_DIR/$rtpLocalDir/g' /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml;

cp /opt/$realTimePlatformDir/storm/storm-0.9.0-wip4/conf/storm.yaml $yamlFilePath;

sudo apt-get install git;

sudo apt-get install uuid-dev;"

ssh -t $username@${array[i-1]} "cd /opt/$realTimePlatformDir/storm;

wget http://download.zeromq.org/zeromq-2.1.7.tar.gz;

tar -xzf zeromq-2.1.7.tar.gz

cd zeromq-2.1.7

./configure

make

sudo make install

cd ..

export JAVA_HOME=$javaHome;

echo $JAVA_HOME;

git clone https://github.com/nathanmarz/jzmq.git

cd jzmq

./autogen.sh

./configure

make

sudo make install"

done

Yep Done! Hope it helped. My next post shares the installation script for CentOS and the next to it a small start up script for the installed Storm cluster.

Start-up script for an installed Apache Zookeeper Cluster

If you have an installed Zookeeper-3.3.5 cluster, this script will save you from manually visiting each node and starting the zkServer there. All you have to do is grab a remote machine and run the script. The script will ask for the required information and your cluster would be up. Also, the script should work equally fine for both Ubuntu and CentOS.

#!/bin/bash

# Directory Name

echo "Enter the zookeeper setup location :"

read -e zookeeperSetupLocation

# Read the ips and the usernames of all the machines in the zookeeper cluster

echo "Enter the number of machines in the cluster";

read -e n;

for (( i = 1 ; i <= n; i++ ))

echo "Enter the IP of cluster machine $i:"

read -e zkServer;

zkServerIPs[i]=$zkServer

echo "Enter the username of machine ${zkServerIPs[i]}"

read -e username

zkServerUsernames[i]=$username

done

# Start up the cluster

for (( i = 1 ; i <= n; i++ ))

ssh ${zkServerUsernames[i]}@${zkServerIPs[i]} "$zookeeperSetupLocation/zookeeper-3.3.5/bin/zkServer.sh start;"

done

Thanks !!

Installation Script for Apache Zookeeper-3.3.5 on Linux

One of my previous blogs describes how to setup a Zookeeper cluster manually. Here's a quick fix: an installation script for the same. You need to run the following script (after storing the content in a .sh file) on your machine and you can install a zookeeper cluster on a set of remote machines. All you need as prerequisite on those machines is the zookeeper-3.3.5 setup at a common location. You can also use this script for other versions, with a bit of modification to the script(replacing the version used in the script to yours. It should have not been hard coded, I know.. My bad).

#!/bin/bash

zkServerEntryPart1="server.";

zkServerEntryPart2="=zoo";

zkServerEntryPart3=":2888:3888NEW_LINE";

zkServerEntry="";

# Local FS Setup location

echo "Enter the path of the folder in which the zookeeper setup is stored. For example '/home/abc/setups'"

read -e setupsLocation

# Directory Name

echo "Enter the directory path where zookeeper is to be installed : "

read -e zookeeperSetupLocation

# Read the number of zookeeper servers

echo "Enter the number of machines in the zookeeper cluster : ";

read -e n;

# Read the zookeeper server details

for (( i = 1 ; i <= n; i++ ))

# obtain the zookeeper server ips of all machines in the cluster

echo "Enter the IP of zookeeper machine $i:"

read -e zookeeperServer;

zookeeperServerIPList[i]=$zookeeperServer

temp=$zkServerEntryPart1""$i""$zkServerEntryPart2""$i""$zkServerEntryPart3;

zkServerEntry=$zkServerEntry""$temp;

# obtain the usernames for all the zookeeper servers

echo "Enter the username of machine ${zookeeperServerIPList[i]}"

read -e username

userNameList[i]=$username

done

# Copy the setup to all the storm machines

for (( i = 1 ; i <= n; i++ ))

echo "Enter the data directory location of zookeeper for the machine ${zookeeperServerIPList[i]}"

read -e dataDir

# create the required folders on the machines

ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "if [ ! -d $zookeeperSetupLocation ]; then

sudo mkdir $zookeeperSetupLocation;

sudo chown -R ${userNameList[i]}: $zookeeperSetupLocation;

if [ ! -d $dataDir ]; then

sudo mkdir $dataDir;

sudo chown -R ${userNameList[i]}: $dataDir;

fi"

# copy the zookeeper setup at the specified location on the machines

scp -r -q $setupsLocation/zookeeper-3.3.5 ${userNameList[i]}@${zookeeperServerIPList[i]}:$zookeeperSetupLocation/zookeeper-3.3.5

# create and configure the 'zoo.cfg' and 'myid' files

ssh -t ${userNameList[i]}@${zookeeperServerIPList[i]} "touch $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "dataDir=$dataDir" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "syncLimit=2" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "initLimit=5" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo -e "clientPort=2181" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

echo "$zkServerEntry" >> $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg;

sed -i 's/NEW_LINE/\n/g' $zookeeperSetupLocation/zookeeper-3.3.5/conf/zoo.cfg

touch $dataDir/myid;

echo -e $i >> $dataDir/myid;"

# update the /etc/hosts file

hostFileEntry=${zookeeperServerIPList[i]}" zoo"$i;

for (( j = 1 ; j <= n; j++ ))

ssh -t ${userNameList[j]}@${zookeeperServerIPList[j]} "sudo cp /etc/hosts /etc/hosts.bak;

sudo cp /etc/hosts /etc/hosts1;

sudo chmod 777 /etc/hosts1;

sudo echo -e "$hostFileEntry" >> /etc/hosts1;

sudo mv /etc/hosts1 /etc/hosts;"

done

Hope it helped. My next blog post is about a small script to start-up the installed zookeeper cluster.

Monday, June 15, 2015

Install and run Augustus on CentOS

Hello Folks .. If you are visiting this blog you definitely know what Augustus is all about, but still for any exceptions, here’s its short introduction taken directly from its makers:

“Augustus is an open source system for building and scoring statistical models designed to work with data sets that are too large to fit into memory“

Although the Augustus documentation is an elaborate and wonderful source of guidelines and information, this blog presents a crisp and condensed bunch of steps you can use to install Augustus and try one of the examples. So open a terminal on your machine and try the following steps:

Step 1. Python 2.6 needs to be installed on the machine as a prerequisite. If already installed, check the version of your python by typing the command "python" on the terminal. If it is 2.6, the output should be:

Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47)

[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>>

Step 2. Run

sudo rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm

Step 3. Run

sudo yum install numpy

Step 4. Enter the python shell and import

> python2.6

Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47)

[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import numpy

>>> numpy.__version__

'1.4.1'

>>>

Step 5. Execute

mkdir AUGUSTUS_INSTALLATION

Step 6. Execute

wget http://augustus.googlecode.com/files/Augustus-0.4.4.0.tar.gz

Step 7. Execute

tar -xzvf Augustus-0.4.4.0.tar.gz

Step 8. Execute

cd Augustus-0.4.4.0 (check that the setup has a "bin" folder containing files like "AugustusPMMLConsumer")

Step 9. Run

sudo python2.6 setup.py install

Last segment of the command output should resemble:

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/baseline/tools/__init__.py to __init__.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/baseline/tools/userInitializeModels.py to userInitializeModels.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/baseline/tools/xml_fifo_io2.py to xml_fifo_io2.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/baseline/tools/fake_score_handler.py to fake_score_handler.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/baseline/tools/userMySQLInterface.py to userMySQLInterface.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/clustering/__init__.py to __init__.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/clustering/producer/Producer.py to Producer.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/clustering/producer/__init__.py to __init__.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/tree/__init__.py to __init__.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/tree/producer/Producer.py to Producer.pyc

byte-compiling /usr/lib/python2.6/site-packages/augustus/modellib/tree/producer/__init__.py to __init__.pyc

running install_scripts

copying build/scripts-2.6/__setpath.py -> /usr/bin

copying build/scripts-2.6/userInitializeConfigs -> /usr/bin

copying build/scripts-2.6/realpmml -> /usr/bin

copying build/scripts-2.6/unitable -> /usr/bin

copying build/scripts-2.6/AugustusBaselineProducer -> /usr/bin

copying build/scripts-2.6/fake_event_source -> /usr/bin

copying build/scripts-2.6/AugustusClusteringProducer -> /usr/bin

copying build/scripts-2.6/runfifo -> /usr/bin

copying build/scripts-2.6/AugustusTreeProducer -> /usr/bin

copying build/scripts-2.6/munge -> /usr/bin

copying build/scripts-2.6/fake_score_handler -> /usr/bin

copying build/scripts-2.6/userInitializeModels -> /usr/bin

copying build/scripts-2.6/AugustusPMMLConsumer -> /usr/bin

copying build/scripts-2.6/AugustusNaiveBayesProducer -> /usr/bin

copying build/scripts-2.6/userBuildMySQL -> /usr/bin

changing mode of /usr/bin/__setpath.py to 755

changing mode of /usr/bin/userInitializeConfigs to 755

changing mode of /usr/bin/realpmml to 755

changing mode of /usr/bin/unitable to 755

changing mode of /usr/bin/AugustusBaselineProducer to 755

changing mode of /usr/bin/fake_event_source to 755

changing mode of /usr/bin/AugustusClusteringProducer to 755

changing mode of /usr/bin/runfifo to 755

changing mode of /usr/bin/AugustusTreeProducer to 755

changing mode of /usr/bin/munge to 755

changing mode of /usr/bin/fake_score_handler to 755

changing mode of /usr/bin/userInitializeModels to 755

changing mode of /usr/bin/AugustusPMMLConsumer to 755

changing mode of /usr/bin/AugustusNaiveBayesProducer to 755

changing mode of /usr/bin/userBuildMySQL to 755

running install_egg_info

Writing /usr/lib/python2.6/site-packages/Augustus-0.4.4.0-py2.6.egg-info

Step 10. Running

python2.6

should return

Python 2.6.5 (r265:79063, Apr 9 2010, 11:16:46)

[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2

Type "help", "copyright", "credits" or "license" for more information.

>>> import augustus.const as AUGUSTUS_CONST

>>> AUGUSTUS_CONST._AUGUSTUS_VER

’0.4.2.0’

>>>

Installation is complete.

Step 11. To run an example:

> cd AUGUSTUS_INSTALLATION/Augustus-0.4.4.0/examples/basic

> python2.6 top-ten.py ../auto/data/training.nab

Sample O/P:

Field: Date

( 5.38500%) '2000-12-14'

( 5.27600%) '2000-12-29'

( 5.10800%) '2000-12-22'

( 5.04100%) '2000-12-18'

( 4.99900%) '2000-12-04'

( 4.90000%) '2000-12-07'

( 4.83400%) '2000-12-03'

( 4.76000%) '2000-12-05'

( 4.71900%) '2000-12-11'

( 4.56300%) '2000-12-26'

Field: Color

(28.86200%) 'Black'

(24.45700%) 'Blue'

(23.75900%) 'Green'

(22.92200%) 'Red'

Field: Automaker

(21.98000%) 'Mazda'

(21.35700%) 'BMW'

(19.88900%) 'Toyota'

(18.56900%) 'Volvo'

(18.20500%) 'Audi'

That’s it. Hope it helped.

Installing SparkMLlib on Linux and Running SparkMLlib implementations

SparkMLlib is a machine learning library which ships with Apache Spark and can run on any Hadoop2/YARN cluster without any pre-installation. It is Spark’s scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

The key features of SparkMLlib include:

1. Scalability
2. Performance
3. User-friendly APIs
4. Integration with Spark and its other components

There is nothing special about MLlib installation, it is already included in Spark. So if your machine already has Spark installed and running, you have nothing to do especially for Spark MLlib. You can follow this link to install Spark in standalone mode if not already done.

Running Logistic Regression on SparkMllib

Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables, which are usually continuous, by estimating probabilities. Logistic regression can be binomial or multinomial. Binomial or binary logistic regression deals with situations in which the observed outcome for a dependent variable can have only two possible types (for example, "dead" vs. "alive"). Multinomial logistic regression deals with situations where the outcome can have three or more possible types (e.g., "disease A" vs. "disease B" vs. "disease C").

Spark provides 'spark-submit.sh’ script to submit jobs to the Spark cluster. The jar spark-assembly-*-cdh*-hadoop*-cdh*.jar comprises all the algorithm implementations.

We shall be now running Logistic Regression as below:

Step-1: Export the required environment variables

export JAVA_HOME='your_java_home'
export SPARK_HOME='your_spark_home'

Step-2: Gather the dataset to run the algorithm on

mkdir ~/SparkMLlib
cd ~/SparkMLlib/
wget https://sites.google.com/site/jayatiatblogs/attachments/sample_binary_classification_data.txt

Now that you have the data set, copy it to HDFS.

hdfs dfs -mkdir -p /user/${USER}/classification_data
hdfs dfs -put -f $HOME/SparkMLlib/sample_binary_classification_data.txt /user/${USER}/classification_data/

Step-3: Submit the job to run Logistic Regression using the 'spark-submit.sh’ script

$SPARK_HOME/bin/spark-submit --class org.apache.spark.examples.mllib.BinaryClassification --master local[2]

$SPARK_HOME/lib/spark-examples-1.2.0-cdh5.3.0-hadoop2.5.0-cdh5.3.0.jar --algorithm LR --regType L2 --regParam 1.0 /user/${USER}/classification_data/sample_binary_classification_data.txt

If all works fine, you must see the following after a long log message:

Test areaUnderPR = 1.0.
Test areaUnderROC = 1.0.

Let’s do some cleaning of your HDFS.

hdfs dfs -rm -r -skipTrash /user/${USER}/classification_data

You can run the other implementations of SparkMLlib as well in a similar fashion with the required data.

Good luck.