Apache Hadoop
1. sudo yum install maven java hadoop eclipse --> you should install eclipse, but you don't have to. I believe when you install eclipse it drags in all the java dev stuff you'd need.
2. vi /usr/etc/hadoop/hadoop-env.sh and edit the export JAVA_HOME line reflect where java is installed. On F16 is installed in /usr/lib/jvm/java
3. type '/usr/etc/hadoop/hadoop-env.sh' there should be no output
4. I'm assuming single-node setup here, so type in '/usr/sbin/hadoop-setup-single-node.sh' and there will be about 5 questions. I answered y to all
5. Hadoop should now be started.
Additionally you can install Apache Pig with sudo yum install pig quite easily.
Apache Mahout
I find it easier to install from svn
1. cd /usr or wherever you want
2. mkdir mahout and change directory into it
Straight from the docs
3. svn co http://svn.apache.org/repos/asf/mahout/trunk ./
4. mvn install or because there are a number of tests you can do mvn -DskipTests install
5. Congrats maven is installed. Add Maven to the path: export PATH=$PATH:/usr/mahout/bin
If you wanted to install Apache Hive, practically the same instructions as this.
Eclipse
Start up Eclipse because we want to install the m2e project.
1. Go to help > install new software
2. Enter http://download.eclipse.org/releases/indigo/ - I have 3.7 (indigo)
3. Go to General Purpose Tools and check m2e, which is maven to eclipse.
Now you should be able to go to File > Import and see Maven there. Perfecto.
Easy peezy. Now you can attempt those tutorials you've been looking at.
Where might you want to go next?
https://cwiki.apache.org/confluence/display/Hive/GettingStarted - Getting started with Apache Pighttp://www.ibm.com/developerworks/java/library/j-mahout/ - Intro to Mahout
http://code.google.com/p/unresyst/wiki/CreateMahoutRecommender - Creating your first Recommender
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html - Hadoop Map/Reduce Tutorial
Oh and grab this book Mahout in Action
Heck let me give you some ideas:
Let's say you want to create a recommendation engine. Attempt to implement one using Mahout/some Java Web Scraper and a bit of data labeling with the Naive Bayes Classifier to do just that. This should be good enough of a start to make it interesting.
Or if you have a bunch of Movies you like already, then implement a recommendation engine using Mahout and IMDB or some other movie rating site.
Want something more complex? Sentiment Analysis? Grab a bunch of tweets and figure out a way to identify each tweet as Positive, Negative or Netural. Heck, even better, grab Amazon reviews (or you fav shopping site) and figure out a way to create ratings from reviews or extract important information from reviews, such as durable / not durable or very cool / traditional, and the list goes on.
Credits: http://androidyou.blogspot.com/2011/11/mahout-and-hadoop-are-all-java.html