Hadoop to the rescue?
I have mentioned elsewhere on this blog that I am currently doing a thesis in which I am using Hadoop and MapReduce. One thing I have always liked whether in school or in life in general is the applicability of what I am learning in the "real world". Once you get past why I really need to learn and know this stuff, I usually can put my mind into it and get motivated to learn. In high school, I took Advance Math, not out of love for Math but it was the most convinient way to take Computer Science. Looking back into it now, I wish someone had really explained to me why I needed to pay more attention. I really did not foresee how some of the stuff we were doing could possibly come in handy later on in life.
Anyway back to my thesis: This afternoon, I bumped into an article at GigaOm by Gary Orenstein entitled "Digging Deeper Into Data With Hadoop". In the article, the author gave some kind of an overview of Hadoop and how its being used by some of the top companies. To me, this made my work very relevant because coincidentally, I had just finished writting a MapReduce program that processes a server log to find out query terms and their frequency. According to Gary, Yahoo is using Hadoop to build a database for its Search Assist feature. I was smiling when I read that before the use of Hadoop, the task used to take 26 days to complete! Thanks to Hadoop, it now takes only 20 minutes! I guess thats a cluster with thousands of nodes. I am also encourage that even Microsoft is using Hadoop although its not clear if its limited to Powerset.
I am glad I found this article because it gives my work relevancy and applicability to real world situations. Although I am working on a relatively small cluster with only 12 nodes, I know all this is not in vain!
Sunday, June 07, 2009 | 0 Comments
My Blog List
-
SXSW: Is Privacy on the Social Web a Technical Problem? - How to deal with user privacy on social networks as they grow, mature and become more sophisticated has been a frequent topic of conversation at this year'...3 hours ago
-
The Onion on Google's data - The Onion has a hilarious article, "Google Responds To Privacy Concerns With Unsettlingly Specific Apology", that should be enjoyable for this crowd. An ex...2 days ago
-
Why Europe’s Largest Ad Targeting Platform Uses Hadoop - Richard Hutton, CTO of nugg.ad, authored the following post about how and why his company uses Hadoop. nugg.ad operates Europe’s largest targeting platform...3 days ago
-
I might not see tomorrow... - Thoughts to paper...Random thoughts Listen, I might be gone by tomorrow so give me a chance Allow me to tell you my thoughts Before the end of my time My w...1 week ago
-
Del.icio.us Python API - One of my recent research tasks required me to retrieve various information from Delicious.com, a well-known social bookmarking service. My programming l...1 week ago
-
Search Engine Basics - Receive the question of "how search works ?" couple times recently so try to document the whole process. This is intended to highlight the key concepts but...1 week ago
-
New threadpool design - In MySQL 6.0 a threadpool design was implemented based on libevents and mutexes. This design unfortunately had a number of deficiences: 1) The performance u...3 months ago
-
Are you ready for the judgment? - By Roy Davison. God is "the Judge of all the earth" (Genesis 18:25). "The LORD shall judge the peoples" (Psalm 7:8 // Hebrews 10:30). "God shall judge the ...3 months ago
-
Suarez’s The Daemon - Finished reading Daniel Suarez’s The Daemon, in between getting grants and writing papers and such, this semester. This is maybe the best book I have rea...9 months ago
