Saturday, May 25, 2013

Installing Mahout on Linux

Mahout is an acquisition of highly scalable machine learning algorithms over very large data sets. Although the real power of Mahout can be vouched for only on large HDFS data, but Mahout also supports running algorithm on local filesystem data, that can help you get a feel of how to run Mahout algorithms.

Installing Mahout on Linux

Before you can run any Mahout algorithm you need a Mahout installation ready on your Linux machine which can be carried out in two ways as described below:

Method I- Extracting the tarball

Yes, it is that simple. Just download the latest Mahout release of from
Extract the downloaded tarball using:

tar –xzvf  /path_to_downloaded_tarball/mahout-distribution-0.x.tar.gz                                                                       
This should result in a folder with name /path_to_downloaded_tarball/mahout-distribution-0.x
Now, you can run any of the algorithms using the script “bin/mahout” present in the extracted folder. For testing your installation, you can also run 

without any other arguments.

Method II- Building Mahout

1. Prerequisites for Building Mahout
 -   Java JDK 1.6
 -   Maven 2.2 or higher (

Install maven and svn using following commands:
sudo apt-get install maven2                                                                

sudo apt-get install subversion                                                                                                    

2. Create a directory where you would want to check out the Mahout code, we’ll call it here MAHOUT_HOME:
cd MAHOUT_HOME                                                                                                              

3. Use Subversion to check out the code:
svn co                                                                     

4. Compiling

mvn -DskipTests install                                                                                                           

5. Setting the environment variables

export MAHOUT_HOME=/location_of_checked_out_mahout        
export PATH=$PATH:$MAHOUT_HOME                                                                             

After following either of the above methods, you can now run any of the available mahout algorithms with appropriate arguments. Also, note that you can run the algorithm over HDFS data or local file system data. In order to run algorithms over data on your local file system set an environment variable with the name “MAHOUT_LOCAL” to anything other than an empty string. That would force mahout to run locally even if HADOOP_CONF_DIR and HADOOP_HOME are set.
To plunge into Mahout by trying out running an algorithm, you can refer to my next post. Hope this proved to be a good starter for you. 
All the best !!!


  1. This comment has been removed by a blog administrator.

    1. This comment has been removed by the author.

  2. Hi,
    I am running recommendation system on a single node hadoop using mahout. It is run on movie data obtained from grouplens (100k data).
    hadoop version - 1.1.1

    I am executing the following command -

    hadoop jar /home/avatar/Desktop/Dissertation/Mahout/mahout-distribution-0.9/mahout-core-0.9-job.jar -s SIMILARITY_COOCCURRENCE --input /user/hduser/mahout/ --output /user/hduser/mahout/output

    After a few successful mapreduce tasks, the following error is thrown by each job-
    14/02/26 15:10:48 INFO mapred.JobClient: Task Id : attempt_201402261501_0007_m_000000_0, Status : FAILED
    Error: org.apache.lucene.util.PriorityQueue.(I)V

    What does this error mean, and how to get over with it?
    Thanks in advance!

  3. Hi,

    I am finding it a bit difficult to diagnose it with the given amount of info.

    Please try looking at the logs to reach the root of the error.

  4. This comment has been removed by the author.

  5. Hi, have you tried installing mahout 0.9? there is no pom.xml

  6. Hi,

    No not yet. I'll get back once I get to try 0.9


  7. Nice Tutorial...
    facing problem with mahout 0.9 and hadoop-2.2.0

    Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
    at org.apache.mahout.common.HadoopUtil.getCustomJobName(
    at org.apache.mahout.common.AbstractJob.prepareJob(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(
    at org.apache.hadoop.util.ProgramDriver.driver(
    at org.apache.mahout.driver.MahoutDriver.main(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
    at org.apache.hadoop.util.RunJar.main(

  8. have you worked in any other machine learning algorithms using mahout?

  9. hii ma'am, i installed mahout but i am confuse about how can i use mahout, is this open on local host or, some other process to use it.

  10. hi when i tried to run a example sh, the data is getting downloaded in the tmp/mahout-work-root/20news-all.
    But i am getting an error,
    put: `/tmp/mahout-work-root/20news-all': No such file or directory

    Kindly help me out

    thank u so much in advance