Mahout is an acquisition of highly scalable machine learning algorithms over very large data sets. Although the real power of Mahout can be vouched for only on large HDFS data, but Mahout also supports running algorithm on local filesystem data, that can help you get a feel of how to run Mahout algorithms.
Installing Mahout on Linux
Before you can run any Mahout algorithm you need a Mahout installation ready on your Linux machine which can be carried out in two ways as described below:
Method I- Extracting the tarball
Yes, it is that simple. Just download the latest Mahout release of from
Extract the downloaded tarball using:
tar –xzvf /path_to_downloaded_tarball/mahout-distribution-0.x.tar.gz
This should result in a folder with name /path_to_downloaded_tarball/mahout-distribution-0.x
Now, you can run any of the algorithms using the script “bin/mahout” present in the extracted folder. For testing your installation, you can also run
without any other arguments.
Method II- Building Mahout
1. Prerequisites for Building Mahout
- Java JDK 1.6
- Maven 2.2 or higher (http://maven.apache.org/)
Install maven and svn using following commands:
sudo apt-get install maven2
sudo apt-get install subversion
2. Create a directory where you would want to check out the Mahout code, we’ll call it here MAHOUT_HOME:
3. Use Subversion to check out the code:
svn co http://svn.apache.org/repos/asf/mahout/trunk
mvn -DskipTests install
5. Setting the environment variables
After following either of the above methods, you can now run any of the available mahout algorithms with appropriate arguments. Also, note that you can run the algorithm over HDFS data or local file system data. In order to run algorithms over data on your local file system set an environment variable with the name “MAHOUT_LOCAL” to anything other than an empty string. That would force mahout to run locally even if HADOOP_CONF_DIR and HADOOP_HOME are set.
To plunge into Mahout by trying out running an algorithm, you can refer to my next post. Hope this proved to be a good starter for you.
All the best !!!