Jayati Tiwari: Setting up Spark-0.7.x in Standalone Mode

Friday, September 4, 2015

Setting up Spark-0.7.x in Standalone Mode

A Spark Cluster in Standalone Mode comprises of one Master and multiple Spark Worker processes. Standalone mode can be used both on a single local machine or on a cluster. This mode does not require any external resource manager such as Mesos.

To deploy a Spark Cluster in Standalone mode, the following steps need to be executed on any one of the nodes.

1. Download the spark-0.7.x setup from:
http://spark.apache.org/downloads.html

2. Extract the Spark setup
tar -xzvf spark-0.7.x-sources.tgz

3. Spark requires Scala's bin directory to be present in the PATH variable of the linux machine. Scala 2.9.3 for Linux can be downloaded from:
http://www.scala-lang.org/downloads

4. Extract the Scala setup
tar -xzvf scala-2.9.3.tgz

5. Export the Scala home by appending the following line into "~/.bashrc" (for CentOS) or "/etc/environment" (for Ubuntu)
export SCALA_HOME=/location_of_extracted_scala_setup/scala-2.9.3

6. Spark can be compiled "sbt" or can be built using Maven. This module states the former method, because of it's simplicity of execution. To compile change directory to the extracted Spark setup and execute the following command:
sbt/sbt package

7. Create a file (if not already present) called "spark-env.sh" in Sparkâ€™s "conf" directory, by copying "conf/spark-env.sh.template", and add the SCALA_HOME variable declaration to it as described below:
export SCALA_HOME=<path to Scala directory>

The Web UI port for the Spark Master and Worker can also be optionally specified by appending the following to "spark-env.sh"
export SPARK_MASTER_WEBUI_PORT=8083
export SPARK_WORKER_WEBUI_PORT=8084

8. To specify the nodes which would behave as the Workers, the IP of the nodes are to mentioned in "conf/slaves". For a cluster containing two worker nodes with IP 192.10.0.1 and 192.10.0.2, the "conf/slaves" would contain:
192.10.0.1
192.10.0.2

This completes the setup process on one node.

For setting up Spark on the other nodes of the cluster, the Spark and Scala Setup should be copied on same locations on the rest of the nodes of the cluster.

Lastly, edit the /etc/hosts file on all the nodes to add the "IP HostName" entries of all the other nodes in the cluster.

Hope that helps !!

3 comments:

Chandra Sekhar ReddySeptember 4, 2019 at 12:14 AM
thanks for sharing
Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
Best video editing services in Hyderabad,ameerpet
Best Graphic Designing services in Hyderabad,ameerpet
Best Ad Making services in Hyderabad,ameerpet
ReplyDelete
Replies
Deepika VermaApril 22, 2022 at 8:29 AM
I have to thank you for the efforts you’ve put in writing this site.

MGSU BA 1st Year Exam Result
MGSU BA 2nd Year Exam Result
MGSU BA 3rd Year Exam Result
ReplyDelete
Replies
sabaOctober 21, 2024 at 3:25 AM
Great guide for setting up Spark! As big data plays a crucial role in processing vast amounts of information, it’s fascinating to see how tools like Spark can handle massive datasets efficiently. This is particularly relevant in fields like digital marketing, where data-driven strategies are essential. For those looking to leverage big data and analytics in marketing, a Advanced Digital Marketing Course By Digiperform can provide comprehensive training on how to harness these technologies for business growth. It’s amazing how platforms like Spark can offer the backbone for data analysis, even in marketing.
ReplyDelete
Replies

Add comment