Thursday, January 31, 2013

HBase Installation : Fully Distributed Mode

HBase can be installed in 3 modes:
  1. Standalone Mode
  2. Pseudo-Distributed Mode
  3. Fully-Distributed Mode
The objective of this post is to provide a tried-and-true procedure for installing HBase in Fully-Distributed Mode. But before we dive deeper into installation nuts and bolts, here are some hbase preliminaries, that I feel I should include as a startup.

1. When talking about installing HBase in fully distributed mode we'll be addressing the following:
  • HDFS: A running instance of HDFS is required for deploying HBase in distributed mode.
  • HBase Master: HBase cluster has a master-slave architecture where the HBase Master is responsible for monitoring all the slaves i.e. Region Servers.
  • Region Servers: These are the slave nodes responsible for storing and managing regions.
  • Zookeeper Cluster: A distributed Apache HBase installation depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble.
2. The coin of setting up a Fully Distributed HBase Cluster has got two sides to it:
  • When Zookeeper cluster is managed by HBase internally
  • When Zookeeper cluster is managed externally
3. HBase is overparticular about the DNS entries of its cluster nodes. Therefore, to avert imminent discrepancies we would be assigning host names to the cluster nodes and using them for installation.

Deploying a Fully-Distributed HBase Cluster

Assumptions
For the purpose of clarity and ease of expression, I'll be assuming that we are setting up a cluster of 3 nodes with IP Addresses
10.10.10.1
10.10.10.2
10.10.10.3
where 10.10.10.1 would be the master and 10.10.10.2,3 would be the slaves/region servers.
Also, we'll be assuming that we have a running instance of HDFS, whose NameNode daemon is running on
10.10.10.4

Case I- When HBase manages the Zookeeper ensemble
HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process.

Step 1: Assign hostnames to all the nodes of the cluster.
10.10.10.1 master
10.10.10.2 regionserver1
10.10.10.3 regionserver2

10.10.10.4 namenode                                                                        
Now in each of these append the required hostnames to the /etc/hosts file.
On the Namenode(10.10.10.4) add:
10.10.10.4 namenode                                                                        

On the Master Node(10.10.10.1) add:
10.10.10.1 master
10.10.10.4 namenode
10.10.10.2 regionserver1
10.10.10.3 regionserver2                                                                   

On the Region Server 1(10.10.10.2) add:
10.10.10.1 master
10.10.10.2 regionserver1
                                                                  

And on the Region Server 2(10.10.10.3) add:
10.10.10.1 master
10.10.10.3 regionserver2                                                                   

Step 2: Download a stable release of hbase from
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on
all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).

Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME
(for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to true to indicate that HBase is supposed to manage the zookeeper ensemble internally.
export JAVA_HOME=your_java_home
export HBASE_MANAGES_ZK=true
                                            

Step 4: Edit the /conf/hbase-site.xml on all the hbase cluster nodes which after your editing should look like:
<configuration>
<property>
    <name>hbase.master</name>
    <value>10.10.10.1:60000</value>
</property>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode:9000/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
</configuration>
                                                                              

Here the property 'hbase.master' reflects the host and port that the HBase master(10.10.10.1) runs at. Next is the 'hbase.rootdir' property which is a directory shared by the region servers. The value has to be an HDFS location, for eg: hdfs://namenode:9000/hbase. Since we have assigned 10.10.10.4, hostname 'namenode' whose NameNode port is assumed to be 9000, we form the rootdir location as hdfs://namenode:9000/hbase. If your namenode is running on some other port replace 9000 by that port number.

Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
regionserver1
regionserver2                                                                                     


This completes the installation process of HBase Cluster with Zookeeper Ensemble being managed internally.

Case II- When the Zookeeper ensemble is managed externally
We can manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use.

Step 1: Assign hostnames to all the nodes of the hbase and zookeeper cluster. Assuming we have a two node zookeeper cluster of nodes 10.10.10.5 and 10.10.10.6
10.10.10.1 master
10.10.10.2 regionserver1
10.10.10.3 regionserver2
10.10.10.4 namenode
10.10.10.5 zkserver1
10.10.10.6 zkserver2                                                                      
  

Now in each of these append the required hostnames to the /etc/hosts file.
On the Namenode(10.10.10.4) add:
10.10.10.4 namenode                                                                        

On the Master Node(10.10.10.1) add:
10.10.10.1 master
10.10.10.4 namenode
10.10.10.2 regionserver1
10.10.10.3 regionserver2
10.10.10.5 zkserver1
10.10.10.6 zkserver2                                                                         

On the Region Server 1(10.10.10.2) add:
10.10.10.1 master
10.10.10.2 regionserver1
10.10.10.5 zkserver1
10.10.10.6 zkserver2                                                                         

And on the Region Server 2(10.10.10.3) add:
10.10.10.1 master
10.10.10.3 regionserver2
10.10.10.5 zkserver1
10.10.10.6 zkserver2                                                                         

Step 2: Download a stable release of hbase from
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on
all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).

Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME
(for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to false to indicate that the zookeeper ensemble would be managed externally.
export JAVA_HOME=your_java_home
export HBASE_MANAGES_ZK=false                                            

Step 4: Edit the /conf/hbase-site.xml on all the hbase cluster nodes which after editing should look like:
<configuration>
<property>
    <name>hbase.master</name>
    <value>10.10.10.1:60000</value>
</property>
<property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode:9000/hbase</value>
</property>
<property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
</property>
<property>

    <name>hbase.zookeeper.property.clientPort</name>
    <value>2181</value>
 </property>
<property>
      <name>hbase.zookeeper.quorum</name>
      <value>zkserver1,zkserver2</value>
</property>
</configuration>                                                                               

Here the property 'hbase.master' reflects the host and port that the HBase master(10.10.10.1) runs at.
Next is the 'hbase.rootdir' property which is a directory shared by the region servers. The value has to be an HDFS location, for eg: hdfs://namenode:9000/hbase. Since we have assigned 10.10.10.4, hostname 'namenode' whose NameNode port is assumed to be 9000, we form the rootdir location as hdfs://namenode:9000/hbase. If your namenode is running on some other port replace 9000 by that port number.
The property 'hbase.zookeeper.property.clientPort' reflects a property from ZooKeeper's config zoo.cfg. It is the port at which the clients will connect.
And lastly the property 'hbase.zookeeper.quorum' is a comma separated list of servers in the ZooKeeper Quorum.

Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
regionserver1
regionserver2                                                                                     

This brings us to an end of the installation process for HBase cluster with externally managed zookeeper ensemble.

Start your HBase Cluster
Having followed the steps above, now its time to start the deployed cluster. If your have an externally managed zookeeper cluster, make sure to start it before you proceed further.
On the master node(10.10.10.1) cd to the hbase setup and run the following command
$HBASE_HOME/bin/start-hbase.sh                                                 
This would start all the master and the region servers on respective nodes of the cluster.

Stop your HBase Cluster
To stop a running cluster, on the master node, cd to the hbase setup and run
$HBASE_HOME/bin/stop-hbase.sh                                                 

Hadoop-HBase Version Compatibility
As per my assessment, hadoop-1.0.4 and hadoop-1.0.3 versions of Hadoop work fine with all hbase versions of the series 0.94.x and 0.92.x, but not the 0.90.x series and the older releases due to version incompatibility.

37 comments:

  1. Great blog, simple and to the point.

    Thanks
    Rishabh Shah

    ReplyDelete
  2. Few things to make life more simpler:
    1. if you have different dedicated user for hadoop, change the owner of the hbase installation directory to the user using Hadoop and Hbase.
    2. Properties mentioned above for hbase-site.xml are missing zookeeper port and quourum property, which can be as follow;

    hbase.zookeeper.property.clientPort
    2222


    hbase.zookeeper.quorum
    cloudera3


    If this above properties are missing it will start and shut the regionservers immediately.

    that's all...
    thanks
    rishabh shah

    ReplyDelete
  3. Hello Rishabh,

    Thanks for your inputs, but the properties specifying zookeeper quorum and client port are reqd. only in the "Case II- When the Zookeeper ensemble is managed externally".

    I have mentioned them in that section. I hope this is what you meant.

    ReplyDelete
  4. HI Jayati

    even in case of internally quorum needs to be specified,port can be neglected if default port is being used.

    while connecting hbase from client we have to specify quorum details to connect with hbase.

    Thanks
    Gaurhari

    ReplyDelete
    Replies
    1. Hello Gaurhari,

      When using zookeeper internally you need not specify the "quorum" since you don't have one. Ofcourse, its optional for you to be doing that but definitely not mandatory. If you do so all you have to specify is localhost.

      Connecting from HBase Client, is an altogether different scenario, here we are concerned about the installation.

      Delete
    2. If you don't specify the "quorum", the regionservers try to connect to localhost and fail. So the quorum is required. Except that, easy and simple explanation. Thanks.

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. Hi,

      I might be able to make out the reason why our HMaster and HRegionservers die off if I could see the logs.

      Delete
    2. My bad. I had not started zookeeper before starting hbase cluster. Thanks for tutorial.

      Delete
  6. Shouldn't zookeeper quorum consist of odd number of servers?

    ReplyDelete
    Replies
    1. Yes that is one of the zk specific optimizations. It should have 1 or 3 servers in the quorum. But an even number works just fine.

      Delete
  7. Hi,

    Would it be possible to have Region server and Data node on different machines ???

    Regards,
    Ankit

    ReplyDelete
    Replies
    1. Hi,

      Great .....!!!

      Can you help me with this ?..... I am getting suggestions that this won't be possible (As HMaster,NameNode & Zookeeper will run on one machine and DataNode & RegionServer on one machine (with multiple such systems)) ..... I want that NameNode, HMaster, Zookeeper, DataNode, & RegionServer all should be on different servers (need that setup for some experiments) ...... Sorry, but, I am new to NoSQL systems and want to do this installations ..... would be great if you can guide me with this ... :)

      Regards,
      Ankit

      P.S.: My theoretical basics about HBase and Hadoop are clear but never did any NoSQL installation before.

      Delete
  8. Few Doubts

    - Is it neccessary to setup the Hadoop cluster first then only we can setup the Hbase Cluster setup
    - When we install is it necessary to install hadoop in all nodes

    help me out

    ReplyDelete
    Replies
    1. Yes .. HBase needs a Hadoop Namenode to connect to and so you need to install Hadoop. That can be a pseudo-distributed installation as well .. that is a cluster is not mandatory.

      Regarding the necessity of installing Hadoop on all the nodes on which you are planning to install HBase, it is not required.

      The Hadoop Namenode can be on any remote machine, infact for production, you should not do this since it would degrade the performance.

      Delete
  9. Hey Jayati,

    Thanks a lot for providing detailed set up instructions of HBase. I tried the steps you mentioned on Amazon EC2 and it worked well.

    Do you have any example that we can try on HBase Cluster.


    Thanks,
    Sanjay

    ReplyDelete
    Replies
    1. One more thing.. In case of option 1-HBase manages Zookeeper internally,I am facing a issue , regionserver is looking zookeeper on its on system(localhost) instead of contacting zookeeper process(HQuorumPeer) started on HMaster. Since Zookeeper process(HQuorumPeer) is started on HMaster and regionserver is looking zookeeper server on its server, HRegionServer process is failing on RegionServer.

      Can you please tell me how to make regionserver to look for zookeeper on Master server instead of checking locally.

      Delete
  10. Hello Jayathi,

    Thanks for nice post.......

    can you help me on this

    I have set java path and zkManaged to true in hadoop-env.sh

    hbase-site.xml hbase.cluster.distributed to true

    hmaster and hregionservers stopping automatically please help me on this..








    ReplyDelete
  11. Hi Jayathi,

    I am new in hbase

    My question is when i am executing "bin/stop-hbase.sh" by master then its stop all region server ?

    As my setup i have two region server one in my local and other is in other system

    Right now i am facing problem when i stop hbase master then it's not stop region server this behavior is default by hbase or i am missing with some configuration ?

    ReplyDelete
  12. ubuntu@namenode:~/hbase/bin$ sudo ./start-hbase.sh
    Error: Could not find or load main class org.apache.hadoop.hbase.util.HBaseConfTool
    Error: Could not find or load main class org.apache.hadoop.hbase.zookeeper.ZKServerTool
    starting master, logging to /home/ubuntu/hbase/bin/../logs/hbase-root-master-namenode.out
    Error: Could not find or load main class org.apache.hadoop.hbase.master.HMaster
    secondary: Permission denied (publickey).
    The authenticity of host 'slave2 (10.169.59.16)' can't be established.
    ECDSA key fingerprint is bc:57:d7:8e:bb:ee:bf:2c:a6:3d:97:d1:06:d4:c7:90.
    Are you sure you want to continue connecting (yes/no)? The authenticity of host 'slave1 (10.164.169.217)' can't be established.
    ECDSA key fingerprint is bc:a2:3d:b5:1a:fd:24:85:61:12:df:49:a3:3e:12:9e.
    Are you sure you want to continue connecting (yes/no)? #localhost: ssh: Could not resolve hostname #localhost: Temporary failure in name resolution
    The authenticity of host 'namenode (10.233.58.19)' can't be established.
    ECDSA key fingerprint is e3:cd:0f:7f:01:4a:78:52:7f:79:c6:8e:5b:c3:02:cf.
    Are you sure you want to continue connecting (yes/no)? yes
    slave2: Warning: Permanently added 'slave2,10.169.59.16' (ECDSA) to the list of known hosts.
    slave2: Permission denied (publickey).

    slave1: Host key verification failed.

    namenode: Host key verification failed.



    ReplyDelete
  13. hello,,, am new to hadoop technology

    well i have successfully installed, hadoop single node cluster and multinode cluster as well,, but nw am installing the hbase, facing some unknwn error,, so could u plz, mention the steps for installing hbase on ubuntu ,,,

    thnx in advance

    ReplyDelete
  14. Hi Jayati,

    Thanks for the post related to Hbase cluster.
    Did you tried the above steps in windows environment without cygwin support?

    Regards,
    Nani.

    ReplyDelete
  15. Hi, Jayati,
    I followed your instructions, and successfully start hbase.
    I do not receive any error messages, and can confirm that all required daemons are running both in master and region servers.
    But when I am start hbase shell and try to create a table, I am getting an error
    ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
    at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:1869)
    at org.apache.hadoop.hbase.master.HMaster.checkNamespaceManagerReady(HMaster.java:1874)
    at org.apache.hadoop.hbase.master.HMaster.ensureNamespaceExists(HMaster.java:2067)
    at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1262)
    at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:398)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:42436)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
    at java.lang.Thread.run(Thread.java:745)

    Could you please advice me, how I can solve this problem, thank you!

    ReplyDelete
  16. Great post! A distributed Apache HBase (TM) installation depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble. Apache HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process. You can also manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use. To toggle HBase management of ZooKeeper, use the HBASE_MANAGES_ZK variable in conf/hbase-env.sh. This variable, which defaults to true, tells HBase whether to start/stop the ZooKeeper ensemble servers as part of HBase start/stop. More at Hadoop Online Training

    ReplyDelete
  17. Acquisition programs often lack a consistent adherence to sound program management (PM) frameworks and instead rely on ad-hoc practices and training measures based on a checklist mindset. To know more about PMP Visit PMP Certification Bangalore

    ReplyDelete
  18. Hello Jayathi, i followed ur steps but iam getting error like this, while creating tables through Hbase shell

    ERROR [main] client.ConnectionManager$HConnectionImplementation: The node /hbase is not in ZooKeeper. It should have been written by the master. Check the value configured in 'zookeeper.znode.parent'. There could be a mismatch with the one configured in the master.

    Thanks in advance

    ReplyDelete
  19. Apart from learning more about Hadoop at hadoop online training, this blog adds to my learning platforms. Great work done by the webmasters. Thanks for your research and experience sharing on a platform like this.

    ReplyDelete
  20. Can we start hbase from any of the regionservers not from the master?

    ReplyDelete
  21. Hi, i'm facing a problem i have installed hbase in fully distributed mode but the web ui is not showing up even at port 60000,60010,60030,16010 i have tried all of them still no sucess i have used jps and processes won't show either but start-hbase.sh works properly even i have two regionservers and a master.

    ReplyDelete
  22. Thanks for sharing the post. Very helpful. :)

    ReplyDelete
  23. Thanks Jayati..Its a wonderful post.

    btw, what Rishabh above is saying is correct. We have to add entries into hive-site.xml with respect to Zookeep Machine and port. Otherwise as he said, RegionServers will keep shutting down.

    ReplyDelete
  24. Hi.... I am try to install hbase nd i had changes all the configuration as per given but while starting hbase shell i had been given following error:

    org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is not running yet
    at org.apache.hadoop.hbase.master.HMaster.checkServiceStarted(HMaster.java:2261)
    at org.apache.hadoop.hbase.master.MasterRpcServices.isMasterRunning(MasterRpcServices.java:930)
    at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:55654)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
    at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
    at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
    at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
    at java.lang.Thread.run(Thread.java:745)


    ReplyDelete
  25. This comment has been removed by the author.

    ReplyDelete
  26. Great article, thank you very much.

    ReplyDelete