HBase can be installed in 3 modes:
On the Master Node(10.10.10.1) add:
On the Region Server 1(10.10.10.2) add:
Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME (for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to true to indicate that HBase is supposed to manage the zookeeper ensemble internally.
On the Master Node(10.10.10.1) add:
On the Region Server 1(10.10.10.2) add:
Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME (for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to false to indicate that the zookeeper ensemble would be managed externally.
Here the property 'hbase.master' reflects the host and port that the HBase master(10.10.10.1) runs at.
Next is the 'hbase.rootdir' property which is a directory shared by the region servers. The value has to be an HDFS location, for eg: hdfs://namenode:9000/hbase. Since we have assigned 10.10.10.4, hostname 'namenode' whose NameNode port is assumed to be 9000, we form the rootdir location as hdfs://namenode:9000/hbase. If your namenode is running on some other port replace 9000 by that port number.
The property 'hbase.zookeeper.property.clientPort' reflects a property from ZooKeeper's config zoo.cfg. It is the port at which the clients will connect.
And lastly the property 'hbase.zookeeper.quorum' is a comma separated list of servers in the ZooKeeper Quorum.
Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
This brings us to an end of the installation process for HBase cluster with externally managed zookeeper ensemble.
Start your HBase Cluster
Having followed the steps above, now its time to start the deployed cluster. If your have an externally managed zookeeper cluster, make sure to start it before you proceed further.
On the master node(10.10.10.1) cd to the hbase setup and run the following command
Hadoop-HBase Version Compatibility
As per my assessment, hadoop-1.0.4 and hadoop-1.0.3 versions of Hadoop work fine with all hbase versions of the series 0.94.x and 0.92.x, but not the 0.90.x series and the older releases due to version incompatibility.
- Standalone Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
The objective of this post is to provide a tried-and-true procedure for installing HBase in Fully-Distributed Mode. But before we dive deeper into installation nuts and bolts, here are some hbase preliminaries, that I feel I should include as a startup.
1. When talking about installing HBase in fully distributed mode we'll be addressing the following:
1. When talking about installing HBase in fully distributed mode we'll be addressing the following:
- HDFS: A running instance of HDFS is required for deploying HBase in distributed mode.
- HBase Master: HBase cluster has a master-slave architecture where the HBase Master is responsible for monitoring all the slaves i.e. Region Servers.
- Region Servers: These are the slave nodes responsible for storing and managing regions.
- Zookeeper Cluster: A distributed Apache HBase installation depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to access the running ZooKeeper ensemble.
2. The coin of setting up a Fully Distributed HBase Cluster has got two sides to it:
- When Zookeeper cluster is managed by HBase internally
- When Zookeeper cluster is managed externally
3. HBase is overparticular about the DNS entries of its cluster nodes. Therefore, to avert imminent discrepancies we would be assigning host names to the cluster nodes and using them for installation.
Deploying a Fully-Distributed HBase Cluster
Assumptions
Deploying a Fully-Distributed HBase Cluster
Assumptions
For the purpose of clarity and ease of expression, I'll be assuming that we are setting up a cluster of 3 nodes with IP Addresses
10.10.10.1
10.10.10.2
10.10.10.3
where 10.10.10.1 would be the master and 10.10.10.2,3 would be the slaves/region servers.
Also, we'll be assuming that we have a running instance of HDFS, whose NameNode daemon is running on
10.10.10.4
Case I- When HBase manages the Zookeeper ensemble
HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process.
Step 1: Assign hostnames to all the nodes of the cluster.
10.10.10.1
10.10.10.2
10.10.10.3
where 10.10.10.1 would be the master and 10.10.10.2,3 would be the slaves/region servers.
Also, we'll be assuming that we have a running instance of HDFS, whose NameNode daemon is running on
10.10.10.4
Case I- When HBase manages the Zookeeper ensemble
HBase by default manages a ZooKeeper "cluster" for you. It will start and stop the ZooKeeper ensemble as part of the HBase start/stop process.
Step 1: Assign hostnames to all the nodes of the cluster.
10.10.10.1 master
10.10.10.2 regionserver1 10.10.10.3 regionserver2 10.10.10.4 namenode |
Now in each of these append the required hostnames to the /etc/hosts file.
On the Namenode(10.10.10.4) add:
10.10.10.4 namenode
|
On the Master Node(10.10.10.1) add:
10.10.10.1 master
10.10.10.4 namenode 10.10.10.2 regionserver1 10.10.10.3 regionserver2 |
On the Region Server 1(10.10.10.2) add:
10.10.10.1 master
10.10.10.2 regionserver1 |
And on the Region Server 2(10.10.10.3) add:
10.10.10.1 master
10.10.10.3 regionserver2 |
Step 2: Download a stable release of hbase from
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).
Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME (for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to true to indicate that HBase is supposed to manage the zookeeper ensemble internally.
export JAVA_HOME=your_java_home
export HBASE_MANAGES_ZK=true |
Step 4: Edit the /conf/hbase-site.xml on all the hbase cluster nodes which after your editing should look like:
<configuration>
<property> <name>hbase.master</name> <value>10.10.10.1:60000</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://namenode:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration> |
Here the property 'hbase.master' reflects the host and port that the HBase master(10.10.10.1) runs at. Next is the 'hbase.rootdir' property which is a directory shared by the region servers. The value has to be an HDFS location, for eg: hdfs://namenode:9000/hbase. Since we have assigned 10.10.10.4, hostname 'namenode' whose NameNode port is assumed to be 9000, we form the rootdir location as hdfs://namenode:9000/hbase. If your namenode is running on some other port replace 9000 by that port number.
Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
regionserver1
regionserver2 |
This completes the installation process of HBase Cluster with Zookeeper Ensemble being managed internally.
Case II- When the Zookeeper ensemble is managed externally
We can manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use.
Step 1: Assign hostnames to all the nodes of the hbase and zookeeper cluster. Assuming we have a two node zookeeper cluster of nodes 10.10.10.5 and 10.10.10.6
Case II- When the Zookeeper ensemble is managed externally
We can manage the ZooKeeper ensemble independent of HBase and just point HBase at the cluster it should use.
Step 1: Assign hostnames to all the nodes of the hbase and zookeeper cluster. Assuming we have a two node zookeeper cluster of nodes 10.10.10.5 and 10.10.10.6
10.10.10.1 master
10.10.10.2 regionserver1 10.10.10.3 regionserver2 10.10.10.4 namenode 10.10.10.5 zkserver1 10.10.10.6 zkserver2 |
Now in each of these append the required hostnames to the /etc/hosts file.
On the Namenode(10.10.10.4) add:
On the Namenode(10.10.10.4) add:
10.10.10.4 namenode
|
On the Master Node(10.10.10.1) add:
10.10.10.1 master
10.10.10.4 namenode 10.10.10.2 regionserver1 10.10.10.3 regionserver2 10.10.10.5 zkserver1 10.10.10.6 zkserver2 |
On the Region Server 1(10.10.10.2) add:
10.10.10.1 master
10.10.10.2 regionserver1 10.10.10.5 zkserver1 10.10.10.6 zkserver2 |
And on the Region Server 2(10.10.10.3) add:
10.10.10.1 master
10.10.10.3 regionserver2 10.10.10.5 zkserver1 10.10.10.6 zkserver2 |
Step 2: Download a stable release of hbase from
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).
http://apache.techartifact.com/mirror/hbase/
and untar it at a suitable location on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3).
Step 3: Edit the /conf/hbase-env.sh file on all the hbase cluster nodes(10.10.10.1, 10.10.10.2, 10.10.10.3) to add the JAVA_HOME (for eg. /usr/lib/jvm/java-6-openjdk/) and to set the HBASE_MANAGES_ZK to false to indicate that the zookeeper ensemble would be managed externally.
export JAVA_HOME=your_java_home
export HBASE_MANAGES_ZK=false |
Step 4: Edit the /conf/hbase-site.xml on all the hbase cluster nodes which after editing should look like:
<configuration>
<property> <name>hbase.master</name> <value>10.10.10.1:60000</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://namenode:9000/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>zkserver1,zkserver2</value> </property> </configuration> |
Here the property 'hbase.master' reflects the host and port that the HBase master(10.10.10.1) runs at.
Next is the 'hbase.rootdir' property which is a directory shared by the region servers. The value has to be an HDFS location, for eg: hdfs://namenode:9000/hbase. Since we have assigned 10.10.10.4, hostname 'namenode' whose NameNode port is assumed to be 9000, we form the rootdir location as hdfs://namenode:9000/hbase. If your namenode is running on some other port replace 9000 by that port number.
The property 'hbase.zookeeper.property.clientPort' reflects a property from ZooKeeper's config zoo.cfg. It is the port at which the clients will connect.
And lastly the property 'hbase.zookeeper.quorum' is a comma separated list of servers in the ZooKeeper Quorum.
Step 5: Edit the /conf/regionservers file on all the hbase cluster nodes. Add the hostnames of all the region server nodes. For eg.
regionserver1
regionserver2 |
This brings us to an end of the installation process for HBase cluster with externally managed zookeeper ensemble.
Start your HBase Cluster
Having followed the steps above, now its time to start the deployed cluster. If your have an externally managed zookeeper cluster, make sure to start it before you proceed further.
On the master node(10.10.10.1) cd to the hbase setup and run the following command
$HBASE_HOME/bin/start-hbase.sh
|
This would start all the master and the region servers on respective nodes of the cluster.
Stop your HBase Cluster
To stop a running cluster, on the master node, cd to the hbase setup and run
Stop your HBase Cluster
To stop a running cluster, on the master node, cd to the hbase setup and run
$HBASE_HOME/bin/stop-hbase.sh
|
Hadoop-HBase Version Compatibility
As per my assessment, hadoop-1.0.4 and hadoop-1.0.3 versions of Hadoop work fine with all hbase versions of the series 0.94.x and 0.92.x, but not the 0.90.x series and the older releases due to version incompatibility.