Jayati Tiwari: May 2011

Thursday, May 26, 2011

Frequent Errors during Installation or Startup of Oozie Server

In spite of the fact that, Oozie Installation Steps have been entered down quite intelligibly and makes one hallucinate it as a matter of a few minutes, but as per my experience, its not that easy.. So, to somehow make it easy for you, one of my previous post is a jest of all those docs that are meant to help you install Oozie. Still here in this post, I am providing the solution to the 5 most common errors that you might have unfortunately encountered.

Error 1 :

Cannot create /var/run/oozie/oozie.pid: Directory nonexistent

Solution :

Changing the permissions of the run folder as in

sudo chmod -cR 777 ./run

sudo chown root:root -R /var/run/

Error 2 :

put: org.apache.hadoop.security.AccessControlException: Permission denied: user=jt, access=WRITE, inode="user":root:supergroup:rwxr-xr-x

Solution :

Add the following entry to your hadoop setup's conf/hdfs-site.xml

<name>dfs.permissions</name>

<value>false</value>

</property>

Error 3 :

put:org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannot create directory /user/jt/examples. Name node is in safe mode.

Solution :

Use "hadoop dfsadmin -safemode leave" command to make the namenode leave safe mode forcefully.

Or use "hadoop dfsadmin -safemode wait" to block till NN leaves by itself.

If you need to get your cluster up and running quickly, you can manipulate the parameter dfs.namenode.threshold.percent.

If you set it to 0, NN will not enter safe mode.

Error 4 :

E0902: Exception occured: [java.io.IOException: Call to localhost/127.0.0.1:9000 failed on local exception: java.io.EOFException]

Solution :

Check whether the port nos of jobtracker and namenode are correctly set in the job.properties file of the application you are running.

Error 5 :

Hadoop StartUp Issue : Hadoop fs command not working and datanode is not running

Solution :

localpath_to_hadoop_data_store/dfs/data/current/VERSION and localpath_to_hadoop_data_store/dfs/name/current/VERSION should have the same ids , if they are not change that of the datanode(s) .

If these included one of the points where you were stuck up, I hope to have helped you. All the very best for Oozie ...

Try On Oozie

While writing this post, I am assuming that you have

Installed oozie on your linux machine
Installed hadoop-0.20.1+

If not, I have already mentioned the Oozie Installation in my previous post.

Steps to get an Oozie app running

Having started hadoop and the oozie server, follow the steps below to get an sample oozie application running :

In case Oozie installation has been done using debian packag, you can find the oozie examples tar.gz at /etc/oozie/doc/oozie else it can be located in the oozie setup folder.
Extract this and the obtained /examples folder would contain apps, input-data and src sub-directories.
Add the following properties to the conf/core-site.xml of your hadoop setup

<property>

<name>hadoop.proxyuser.oozie.hosts</name>

</property>

<property>

<name>hadoop.proxyuser.oozie.groups</name>

</property>

In order to run any of the apps, remember to edit the port nos. of jobtracker and the namenode in the job.properties file of the app depending upon your hadoop configuration

JobTracker port no. is set in: /conf/mapred-site.xml

and NameNode port no. is set in : /conf/core-site.xml

Accordingly replace the 'JTPortNo' and 'NNPortNo' in job.properties as below :

oozie.wf.application.path=hdfs://localhost:NNPortNo/path_to_examples/apps/map-reduce

jobTracker=localhost:JTPortNo

nameNode=hdfs://localhost:NNPortNo

queueName=default

outputDir=map-reduce

Now its time to copy the examples dir to hdfs, but if there is already an examples dir in hdfs you must delete it else the files are not copied. Here's the command :

/path_to_hadoopdir/bin/hadoop fs -put /path_to_egdir/examples examples

For a confirmation, you can check if the copy has been successful at

http://localhost:50070

Run the following command to get the example running

In case of debian package used for installation of Oozie

/usr/lib/oozie/bin/oozie job -oozie http://localhost:11000/oozie -config /path_to_egdir/examples/apps/map-reduce/job.properties -run

else

/path_to_oozie/bin/oozie job -oozie http://localhost:11000/oozie -config /path_to_egdir/examples/apps/map-reduce/job.properties -run

Here an important note is that you need to specify the local system path to job.properties and not that of hdfs in the command.

If the application has started off successfully, a job id would be returned in response to the above command, something like this :

                    job: 14-20090525161321-oozie-tucu

If you have the web console installed, you can view the status of the job on

http://localhost:11000/oozie

else the following command will do

/path_to_oozie/bin/oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-tucu

That's it … You can apply the same steps for running any of the documented examples. Well, if the things have not worked as smoothly as they seem, my next post on 'Errors while installation and running Oozie' could be an answer.

Oozie Installation

This post is an attempt to provide you with a very consolidated list of commands that are required to install Oozie, since the documentation involves many optional steps and requires you to refine lot many links to get to correct procedure.

As mentioned in my previous post, Oozie has two flavors, one is the Cloudera distribution and the other is the Yahoo distribution. This post is an attempt to state Cloudera Oozie's installation using two methods :

Installing Oozie debian package
Installing Oozie tarball

Installing Cloudera's Oozie

Prerequisites

A Unix-like system (tested on Centos 5.5, Ubuntu 9.10+, SUSE Linux Enterprise Server 11, OS X 10.6)
Java 1.6+ (tested with JRE 1.6.0_20)
A Unix user and group named oozie on your machine

Installing Oozie debian package

The Oozie debian package for installing Oozie has separate packages for Oozie server(oozie) and the client(oozie-client).

1. Download them from the following link :

http://archive.cloudera.com/debian/pool/contrib/o/

2. Before proceeding further, it is required to install the Cloudera's Yum, for which you need to add a repository by creating a new file (a normal text file named cloudera.list) /etc/apt/sources.list.d/cloudera.list with the following two lines of content.

deb http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib
deb-src http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib

where <RELEASE> is to be replaced by the output of the command lsb_release -c.

3. Run the following command

$ sudo apt-get update

  4. After this simply install the oozie server and client using the debian packages.

  5. Start the oozie server using

$ sudo -u oozie /usr/lib/oozie/bin/oozie-start.sh

6. Stop the oozie server

$ sudo -u oozie /usr/lib/oozie/bin/oozie-stop.sh

Installing Oozie Tarball

1. Download the Oozie tarball(ver Oozie 2.3.0+31.2) from :

https://ccp.cloudera.com/display/SUPPORT/CDH3+Downloadable+Tarballs

2. Unpack the tarball in the appropriate directory. For example:

$ (cd /home/abc/ && sudo tar -zxvf <PATH_TO_OOZIE_TAR_GZ>)

3. Change ownership of the Oozie installation to oozie:oozie:

$ sudo chown -R oozie:oozie /home/abc/oozie-2.3.0+31.2

This installs both the client and the server and the directory contains all the necessary client and server files.

4. Start the oozie server

$ sudo -u oozie /home/abc/oozie-*/bin/oozie-start.sh

5. Stop the oozie server

$ sudo -u oozie /home/abc/oozie-*/bin/oozie-stop.sh

Oozie Web Console

Though it is optional to use the web console, and there are command line utilities that can be used instead, I would recommend installing it as it presents a very clear picture of the running oozie jobs and is very easy to install.

1. Download the ExtJS library from

http://extjs.com/deploy/ext-2.2.zip

2. Place it in a convinient location and add to Oozie through this command

$ sudo -u oozie /usr/lib/oozie/bin/oozie-setup.sh -extjs /path_to_ExtJS/ext-2.2.z

$ sudo -u oozie /home/abc/oozie-*/bin/oozie-setup.sh -extjs /path_to_ExtJS/ext-2.2.zip

3. Having done this, after starting the oozie server as described above , you can view the console at :

http://localhost:11000/oozie

That's all about installing Oozie, my next post would be on how to run an oozie sample application.

Tuesday, May 10, 2011

Introduction to Oozie

What is Oozie ?

“Oozie is a server-based workflow engine specialized in running workflow jobs with actions that execute Hadoop jobs, such as MapReduce, Pig, Hive, Sqoop, HDFS operations, and sub-workflows” as stated by 'cloudera.com'. The term 'OOZIE' literally is a Burmese term for an elephant rider/controller more commonly known by the Indian term 'Mahout'. It is a very accurate mapping as this workflow engine is also a controller to all the hadoop jobs that are a part of its workflow.

Oozie Flavors

Oozie has got two flavors

Cloudera distribution for oozie
Yahoo distribution for oozie

and you can use it in different combinations with Apache/Cloudera Hadoop as below :

Cloudera oozie + Cloudera Hadoop
Cloudera oozie + Apache Hadoop
Yahoo oozie + Cloudera Hadoop
Yahoo oozie + Apache Hadoop

The first combination works just fine with a very feasible installation using debian packages. One of the major difference between the two distributions is that Yahoo distribution of oozie has no support for running hive actions but certainly patches have been added that add support for the same, whereas Cloudera distribution of oozie supports running hive and sqoop jobs and also has a sample workflow applications for them included in its set of examples.

Need Assessment of Oozie

Lets start it with some stats, according to the Oozie presentation during Hadoop Summit in June – there are over 4800+ workflow applications deployed within Yahoo! at the moment, with largest workflow containing 2000 actions.

It is very difficult to manage and run such workflows repeatedly without having a workflow engine that can automate these jobs. Not only this, in many of our small applications too we need a controller which can smoothly execute a set of given jobs and notify only in conditions where there is need of user intervention or some failure. If these jobs are Hadoop jobs, Oozie is the best choice one can make.

Oozie Highlights

Allows to run a series of map-reduce, hive, pig, java & scripts actions a single workflow job
Allows regular scheduling of workflow jobs
Uses an XML file for writing workflows and Direct Acyclic Graph for expressing them
It supports: mapreduce (java, streaming, pipes), pig, java, filesystem, ssh, hive, sqoop, sub-workflow
Supports variables and functions for parameterization of workflows
Supports decision nodes allowing the workflow to make decisions
Oozie interval job scheduling is time & input-data-dependent based
It runs as server (multi user, multi workflows)
Oozie, actions run in the Hadoop cluster as the user that submitted the workflow
Oozie uses a SQL/Derby database, a workflow state is in memory only when doing a state transition
In case of fail-overs, running workflows continue running from their current state

Hope this post gave you some idea of what exactly Oozie is all about. In my next post, I'll be mentioning the steps to install oozie and getting started with it.