Getting Eclipse to work with Hadoop
Before I began what I thought to have been an easy task setting up Eclipse so that I could write MapReduce programs and not have to run the mapreduce programs manually, I came found a website that had a statement mentioning how getting Hadoop running could be a whole day's job. They were marketing a Hadoop live cd and just assumed it was a clever marketing ploy. Unfortunately with my limited Linux environment experience, the author was right! (Another live cd, virtual image)
If you got to this article searching for a tutorial, this post will likely disappoint you. I know the feeling. I scoured endlessly on the internet for step by step instructions but I just found many unanswered questions.
I had Hadoop setup before on both my laptop and desktop following Michael Noll's instructions, running in a pseudo-distributed mode. The problem was getting Eclipse set up to run mapreduce jobs - a very cool and time saving feature. After following many how-to's, I ended up with a bad Hadoop installation. This was the bad part. I had replaced the hadoop-site.xml file with a different one. This new file had the hadoop temporary directory under Ubuntu's temporary directory. It worked until I rebooted. I found out later the hard way that by rebooting, the directories or files needed by hadoop were being deleted. As a result, you would have a good working install soon after you format the namenode but have all sorts of problems once you reboot.
The other major problem was that Eclipse was having all sorts of access errors. What I found out was that following Noll's instructions, he advises to create a separate hadoop account. This was no problem using ssh but was a problem trying to connect through eclipse. What I eventually did was just forget about a separate hadoop account and it eliminated all the problems I was having.
I wish all the frustration had not prevented me from taking better notes so that my experience could be of help to someone. The trick though was to use Eclipse 3.3 . The most recent versions had too many options which became too complicated for someone who is still yet to fully master Linux.
My Blog List
-
SXSW: Is Privacy on the Social Web a Technical Problem? - How to deal with user privacy on social networks as they grow, mature and become more sophisticated has been a frequent topic of conversation at this year'...3 hours ago
-
The Onion on Google's data - The Onion has a hilarious article, "Google Responds To Privacy Concerns With Unsettlingly Specific Apology", that should be enjoyable for this crowd. An ex...2 days ago
-
Why Europe’s Largest Ad Targeting Platform Uses Hadoop - Richard Hutton, CTO of nugg.ad, authored the following post about how and why his company uses Hadoop. nugg.ad operates Europe’s largest targeting platform...3 days ago
-
I might not see tomorrow... - Thoughts to paper...Random thoughts Listen, I might be gone by tomorrow so give me a chance Allow me to tell you my thoughts Before the end of my time My w...1 week ago
-
Del.icio.us Python API - One of my recent research tasks required me to retrieve various information from Delicious.com, a well-known social bookmarking service. My programming l...1 week ago
-
Search Engine Basics - Receive the question of "how search works ?" couple times recently so try to document the whole process. This is intended to highlight the key concepts but...1 week ago
-
New threadpool design - In MySQL 6.0 a threadpool design was implemented based on libevents and mutexes. This design unfortunately had a number of deficiences: 1) The performance u...3 months ago
-
Are you ready for the judgment? - By Roy Davison. God is "the Judge of all the earth" (Genesis 18:25). "The LORD shall judge the peoples" (Psalm 7:8 // Hebrews 10:30). "God shall judge the ...3 months ago
-
Suarez’s The Daemon - Finished reading Daniel Suarez’s The Daemon, in between getting grants and writing papers and such, this semester. This is maybe the best book I have rea...9 months ago

Post a Comment