The reduce copier failed!

Labels:

I spend most of this weekend trying to figure out what was wrong with either my Hadoop cluster (running version 0.18.2) or one of the jobs that I was submitting. I kept getting a "reduce copier failed". I suspected that this might have to do with the mapred.child.java.opts but changing it to -Xmx512M did not help. I tried googling to see if there was anybody with a similar problem. Well as it turned out, I once had a similar problem although at that time it was just a development pseudo-distributed install running on my laptop. Using conf.set("mapred.job.tracker", "local") did the trick. So I assumed removing this option when I was ready to run the program in a multi-node cluster would work. Unfortunately it did not. Other jobs including the example programs executed without any problems. This particular job however processes huge amounts of data (typical Hadoop jobs I know) and so I thought this could have been the cause of the problem.

After a very long time and I don't know how many searches in Google and how many documents I read, I did not find a solution. There were several references to the "reduce copier failed" problem but it seemed mine was a little bit different. At some point i thought this could have been an issue that was resolved in newer version but I still could not find any evidence that someone else had this problem before so I was determined to get under the hood, play around with the settings and make it work. I failed. So I reluctantly upgraded to 0.19.1 (I don't remember why I did not do this in the first place) and boom! It worked.

So "reduce copier failed" was resolved by upgrading to 0.19.1.

0 comments:

Post a Comment

Hadoop and Distributed Computing at Yahoo!

My Blog List