Thursday, May 26, 2011

Oozie Installation


This post is an attempt to provide you with a very consolidated list of commands that are required to install Oozie, since the documentation involves many optional steps and requires you to refine lot many links to get to correct procedure.
As mentioned in my previous post, Oozie has two flavors, one is the Cloudera distribution and the other is the Yahoo distribution. This post is an attempt to state Cloudera Oozie's installation using two methods :
  • Installing Oozie debian package
  • Installing Oozie tarball 

Installing Cloudera's Oozie

Prerequisites

  • A Unix-like system (tested on Centos 5.5, Ubuntu 9.10+, SUSE Linux Enterprise Server 11, OS X 10.6)
  • Java 1.6+ (tested with JRE 1.6.0_20)
  • A Unix user and group named oozie on your machine

Installing Oozie debian package

The Oozie debian package for installing Oozie has separate packages for Oozie server(oozie) and the client(oozie-client). 

  1. Download them from the following link :
  
  2. Before proceeding further, it is required to install the Cloudera's Yum, for which you need to add a repository by creating a new file (a normal text file named cloudera.list) /etc/apt/sources.list.d/cloudera.list with the following two lines of content.
  • deb http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib
  • deb-src http://archive.cloudera.com/debian <RELEASE>-cdh3 contrib
where <RELEASE> is to be replaced by the output of the command lsb_release -c.
  
  3. Run the following command
  • $ sudo apt-get update
  4. After this simply install the oozie server and client using the debian packages.  
  5. Start the oozie server using
  • $ sudo -u oozie /usr/lib/oozie/bin/oozie-start.sh
  6. Stop the oozie server
  • $ sudo -u oozie /usr/lib/oozie/bin/oozie-stop.sh

Installing Oozie Tarball


  1. Download the Oozie tarball(ver Oozie 2.3.0+31.2) from :
  
  2. Unpack the tarball in the appropriate directory. For example: 
  • $ (cd /home/abc/ && sudo tar -zxvf <PATH_TO_OOZIE_TAR_GZ>)
  
  3. Change ownership of the Oozie installation to oozie:oozie:
  • $ sudo chown -R oozie:oozie /home/abc/oozie-2.3.0+31.2
This installs both the client and the server and the directory contains all the necessary client and server files. 
  
  4. Start the oozie server
  • $ sudo -u oozie /home/abc/oozie-*/bin/oozie-start.sh
  5. Stop the oozie server
  • $ sudo -u oozie /home/abc/oozie-*/bin/oozie-stop.sh

    Oozie Web Console

    Though it is optional to use the web console, and there are command line utilities that can be used instead, I would recommend installing it as it presents a very clear picture of the running oozie jobs and is very easy to install.

      1. Download the ExtJS library from
      
      2. Place it in a convinient location and add to Oozie through this command
    • $ sudo -u oozie /usr/lib/oozie/bin/oozie-setup.sh -extjs /path_to_ExtJS/ext-2.2.z
                              or 
    • $ sudo -u oozie /home/abc/oozie-*/bin/oozie-setup.sh -extjs /path_to_ExtJS/ext-2.2.zip
        
        3. Having done this, after starting the oozie server as described above , you can view the console at :
      • http://localhost:11000/oozie

      That's all about installing Oozie, my next post would be on how to run an oozie sample application.

      32 comments:

      1. Hi, i followed the same steps and there was no error. But when i am trying to run http://localhost:11000/oozie it doesnt show any thing . only documentation link is displayed. pls guide
        is it necessary to install as 'oozie' user because i have installed using 'hadoop' user.

        ReplyDelete
      2. Hi, Yes installing as the 'oozie' user is a prerequisite, but before you try that I would suggest you to check if your oozie server is installed successfully and running fine. For that you can try running some documented examples of oozie as its a matter of a few steps ..
        If that works, you can be sure that there is some problem with the web console installation.

        ReplyDelete
      3. Thanks for quick reply. I executed following command bin/oozie admin -oozie http://localhost:11000/oozie -status and got the response as System mode: NORMAL
        But when I am trying out example from http://yahoo.github.com/oozie/releases/2.3.0/DG_Examples.html, its throwing me exception Error: E0902 : E0902: Exception occured: [java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused]
        What could be wrong ?

        ReplyDelete
      4. ok .. this is because you have probably not set the correct port numbers for JobTracker and the NameNode in the job.properties file of your oozie application. For detailed information about this you can follow my next blog in the series :
        http://jayatiatblogs.blogspot.com/2011/05/try-on-oozie.html
        if it could be of some help ..
        and in any case a NORMAL system mode indicates that your oozie installation has been successful.

        ReplyDelete
      5. Following error occurs while running job from oozie.Any ideas?

        Error: E0902 : Exception occurred: [org.apache.hadoop.ipc.RemoteException: User: oozie is not allowed to impersonate oozie]

        Thx
        Sada

        ReplyDelete
        Replies
        1. try adding these 2 properties in core-site.xml and restart hadoop

          hadoop.proxyuser.oozie.groups
          *


          hadoop.proxyuser.oozie.groups
          *

          Delete
      6. Does your core-site.xml file contain entries for "hadoop.proxyuser.oozie.hosts" and "hadoop.proxyuser.oozie.groups" ?
        And which user are you using to submit to job?

        ReplyDelete
      7. Yes we do have proxy entries in core-site.Anyways it's working after changing proxy values to ' * ' though these settings are not recommended for production clusters.

        ReplyDelete
      8. Yeah .. setting them to '*' generally works ...

        ReplyDelete
      9. Hi even I am facing the same issue
        E0902: Exception occured: [org.apache.hadoop.ipc.RemoteException: User: oozie is not allowed to impersonate oozie

        I am using Hadoop ver:0.20.203, Oozie:3.0.2

        My Core-site.xml contains the proxy user:
        < property>
        < name>hadoop.proxyuser.oozie.hosts< /name>
        < value>*< / value>
        < /property>
        < property>
        < name>hadoop.proxyuser.oozie.groups< /name>
        < value>*< /value>
        < /property>

        My oozie conf file contains ...
        "oozie.service.HadoopAccessorService.kerberos.enabled" is set to false

        My oozie starts without any error I am able to access the webconsole.

        But when I try to run the oozie job I get the above error

        ReplyDelete
      10. Could you please try after adding the following to your conf/oozie-site.xml and see if it works :

        < property >
        < name >oozie.services.ext< /name >
        < value >
        org.apache.oozie.service.HadoopAccessorService
        < /value >
        < description >
        To add/replace services defined in 'oozie.services' with custom implementations.Class names must be separated by commas.
        < /description >
        < /property >

        ReplyDelete
      11. Jayati : I just wanted to say thanks *so* much for writing the above (10:55) response. I had been bashing my head against "oozie not allowed to impersonate ..." errors for a while before finding this post and your response.

        ReplyDelete
      12. Glad to know it was helpful for you ...

        ReplyDelete
      13. oozie job -oozie http://localhost:11000/oozie -config /usr/local/hadoop/temp/examples/apps/map-reduce/job.properties -run
        Error: E0902 : E0902: Exception occured: [java.io.IOException: Call to localhost/127.0.0.1:54310 failed on local exception: java.io.EOFException]

        ReplyDelete
      14. Error: E0902 : E0902: Exception occured: [org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 63, server = 61)]
        I was facing same prob what Suresh was facing so i tried Nov 23, 2011 10:55 PM post of yours. Now its gives error written above. Help needed.Thanks in advance.

        ReplyDelete
      15. Ohkk .. that is strange .. it seems an issue with the distribution or versions of oozie and hadoop. Could you please tell what versions are you using ?

        ReplyDelete
      16. While running oozie job :
        bin/oozie job -oozie $OOZIE_URL -config examples/apps/map-reduce/job.properties -run
        Error: E0902 : E0902: Exception occured: [java.net.ConnectException: Call to localhost/127.0.0.1:8020 failed on connection exception: java.net.ConnectException: Connection refused]

        What is this port no 8020 & why it is taking locahost.thanks in advance.

        ReplyDelete
      17. Please go through my next blog on Oozie at http://jayatiatblogs.blogspot.in/2011/05/try-on-oozie.html
        You'll probably find your answer.

        ReplyDelete
      18. This comment has been removed by the author.

        ReplyDelete
      19. Hi Jayati...
        Can you please help me in deleting old oozie workflows as these are creating deadlock problem in my node

        ReplyDelete
      20. We are trying to create a coordinator workflow on oozie-2.3.2+27.19.But we are facing a certain problem,if the number of coordinator exceeding more than two.We are getting the following error


        E0607: Other error in operation [getWorkflow], An optimistic lock violation was detected when flushing object instance "A lock could not be obtained within the time requested [java.lang.String]" to the data store. This indicates that the object was concurrently modified in another transaction.
        org.apache.oozie.command.CommandException: E0607: Other error in operation [getWorkflow], An optimistic lock violation was detected when flushing object instance "A lock could not be obtained within the time requested [java.lang.String]" to the data store. This indicates that the object was concurrently modified in another transaction.
        at org.apache.oozie.command.Command.call(Command.java:259)

        ReplyDelete
      21. Hi Jayati...
        I executed following command bin/oozie admin -oozie http://localhost:11000/oozie -status and got the Following Error-Error: HTTP error code: 404 : Not Found

        ReplyDelete
      22. Hi Jayati,

        Your above posts were really helpfull.

        I am new to Hadoop and Oozie as well. I am starting with Oozie installation on Apache hadoop cluster of a master node and 2 datanode. my dought is where do i install oozie server and client in my cluster. i.e. on datanode or name node.
        Waiting for your reply!!!

        Regards,
        Risahbh

        ReplyDelete
      23. Hello Rishabh,

        You can run your oozie server on either of them. But Namenode is the recommended option since you'll not have to worry about specifying the hostname of the Namenode in your config files and hosts file. Using localhost straight away would be possible.

        ReplyDelete
      24. Hello Jayati,

        I have installed an oozie server on a single node cluster but I am getting an error 404 not found when I am executing following command:

        /usr/local/oozie-3.0.2/bin/oozie job -oozie http://localhost:11000/oozie -config /usr/local/oozie-3.0.2/examples/apps/map-reduce/job.properties -run

        I have also copied oozie.services properties to oozie-site.xml from oozie-default.xml and removed
        kerberos from kerberosHadoopAccessorService but still the error is same.

        What can I do to resolve it further.

        Thanks for your earlier reply. it was helpful.

        Regards,
        Rishabh

        ReplyDelete
      25. Hi Jayati ...
        I am geting

        + oozie job -oozie http://hmaster:11000/oozie -Doozie.wf.application.path=hdfs://hmaster:8020/user/honglin.wang/deploy/run/oozie/ -config tmp.job.properties -run
        Error: E0607 : E0607: Other error in operation [ org.apache.openjpa.persistence.RollbackException: The transaction has been rolled back. See the nested exceptions for details on the errors that occurred.], {1}

        ReplyDelete
      26. Hello Jayati,

        I have followed the steps as you mentioned in oozie setup and examples. Seems, oozie setup works fine as i can see the status as Normal and Web UI also running fine.

        But, while trying to run the examples steps by running the below command and getting the exception:

        "Error: E0902 : E0902: Exception occured: [org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4]"

        Command:
        /usr/test/oozie2.3/oozie-2.3.0-CDH3B4/bin/oozie job -oozie http://localhost:11000/oozie -config /usr/test/oozie2.3/oozie-2.3.0-CDH3B4/examples/apps/map-reduce/job.properties -run



        I also tried to run the examples with the different combinations of oozie builds given below , but getting the same exception in all the cases.

        oozie-3.3.2-cdh4.7.0.tar.gz
        oozie-3.2.0-cdh4.1.1.tar.gz
        oozie-2.3.0-CDH3B4.tar.gz
        oozie-latest.tar.gz

        Version installed on my machine.

        Hadoop 2.0.0-cdh4.1.1
        Oozie client build version: 2.3.0-CDH3B4

        Not sure what went wrong.Please help to get out of this problem

        Regards,
        Praveen Sharma

        ReplyDelete
      27. Hi Jayati, I am trying to start a oozie job with Kerberos authentication enabled. I am getting "JA009: Can't get Master Kerberos principal for use as renewer " error. as suggested above, I included org.apache.oozie.service.HadoopAccessorService to oozie.services.ext. Still same. Could you please help. Thanks

        ReplyDelete
      28. This comment has been removed by the author.

        ReplyDelete
      29. Nice post ! Thanks for sharing valuable information with us. Keep sharing. Big data hadoop online Course India

        ReplyDelete