Jayati Tiwari: Installing Mahout on Linux

Saturday, May 25, 2013

Installing Mahout on Linux

Mahout is an acquisition of highly scalable machine learning algorithms over very large data sets. Although the real power of Mahout can be vouched for only on large HDFS data, but Mahout also supports running algorithm on local filesystem data, that can help you get a feel of how to run Mahout algorithms.

Installing Mahout on Linux

Before you can run any Mahout algorithm you need a Mahout installation ready on your Linux machine which can be carried out in two ways as described below:

Method I- Extracting the tarball

Yes, it is that simple. Just download the latest Mahout release of from

http://www.apache.org/dyn/closer.cgi/mahout/

Extract the downloaded tarball using:

tar –xzvf /path_to_downloaded_tarball/mahout-distribution-0.x.tar.gz

This should result in a folder with name /path_to_downloaded_tarball/mahout-distribution-0.x

Now, you can run any of the algorithms using the script “bin/mahout” present in the extracted folder. For testing your installation, you can also run

bin/mahout

without any other arguments.

Method II- Building Mahout

1. Prerequisites for Building Mahout

- Java JDK 1.6

- Maven 2.2 or higher (http://maven.apache.org/)

Install maven and svn using following commands:

sudo apt-get install maven2

sudo apt-get install subversion

2. Create a directory where you would want to check out the Mahout code, we’ll call it here MAHOUT_HOME:

mkdir MAHOUT_HOME

cd MAHOUT_HOME

3. Use Subversion to check out the code:

svn co http://svn.apache.org/repos/asf/mahout/trunk

4. Compiling

cd MAHOUT_HOME

mvn -DskipTests install

5. Setting the environment variables

export HADOOP_CONF_DIR=$HADOOP_HOME/conf

export MAHOUT_HOME=/location_of_checked_out_mahout

export PATH=$PATH:$MAHOUT_HOME

After following either of the above methods, you can now run any of the available mahout algorithms with appropriate arguments. Also, note that you can run the algorithm over HDFS data or local file system data. In order to run algorithms over data on your local file system set an environment variable with the name “MAHOUT_LOCAL” to anything other than an empty string. That would force mahout to run locally even if HADOOP_CONF_DIR and HADOOP_HOME are set.

To plunge into Mahout by trying out running an algorithm, you can refer to my next post. Hope this proved to be a good starter for you.

All the best !!!

13 comments:

Anirban ChakrabortyJune 18, 2013 at 2:21 AM
thanks for the help
ReplyDelete
Replies
UnknownSeptember 19, 2013 at 11:01 AM
This comment has been removed by a blog administrator.
ReplyDelete
Replies
AvAtArFebruary 26, 2014 at 2:23 AM
Hi,
I am running recommendation system on a single node hadoop using mahout. It is run on movie data obtained from grouplens (100k data).
Versions:
hadoop version - 1.1.1
mahout-distribution-0.9

I am executing the following command -

hadoop jar /home/avatar/Desktop/Dissertation/Mahout/mahout-distribution-0.9/mahout-core-0.9-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input /user/hduser/mahout/u.data --output /user/hduser/mahout/output

After a few successful mapreduce tasks, the following error is thrown by each job-
14/02/26 15:10:48 INFO mapred.JobClient: Task Id : attempt_201402261501_0007_m_000000_0, Status : FAILED
Error: org.apache.lucene.util.PriorityQueue.(I)V

What does this error mean, and how to get over with it?
Thanks in advance!
ReplyDelete
Replies
JayatiFebruary 27, 2014 at 5:47 AM
Hi,

I am finding it a bit difficult to diagnose it with the given amount of info.

Please try looking at the logs to reach the root of the error.
ReplyDelete
Replies
UnknownMarch 13, 2014 at 11:32 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownMarch 13, 2014 at 11:34 PM
Hi, have you tried installing mahout 0.9? there is no pom.xml
ReplyDelete
Replies
JayatiMarch 18, 2014 at 2:32 AM
Hi,

No not yet. I'll get back once I get to try 0.9

Jayati
ReplyDelete
Replies
Samitha_KumaraApril 10, 2014 at 1:51 AM
thanks. clear and concise
ReplyDelete
Replies
UnknownAugust 4, 2014 at 11:03 PM
Nice Tutorial...
facing problem with mahout 0.9 and hadoop-2.2.0

Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at org.apache.mahout.common.HadoopUtil.getCustomJobName(HadoopUtil.java:174)
at org.apache.mahout.common.AbstractJob.prepareJob(AbstractJob.java:614)
at org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
ReplyDelete
Replies
UnknownJuly 9, 2015 at 1:30 AM
have you worked in any other machine learning algorithms using mahout?
ReplyDelete
Replies
UnknownDecember 28, 2015 at 5:10 AM
hii ma'am, i installed mahout but i am confuse about how can i use mahout, is this open on local host or, some other process to use it.
ReplyDelete
Replies
UnknownMarch 17, 2016 at 12:17 AM
hi when i tried to run a example sh, the data is getting downloaded in the tmp/mahout-work-root/20news-all.
But i am getting an error,
put: `/tmp/mahout-work-root/20news-all': No such file or directory

Kindly help me out

thank u so much in advance
ReplyDelete
Replies

Add comment