I had the exact same situation a while back while switching to YARN. Basically there was the concept of task slots
in MRv1 and containers
in MRv2. Both of these differ very much in how the tasks are scheduled and run on the nodes.
The reason that your job is stuck is that it is unable to find/start a container
. If you go into the full logs of Resource Manager/Application Master
etc daemons, you may find that it is doing nothing after it starts to allocate a new container.
To solve the problem, you have to tweak your memory settings in yarn-site.xml
and mapred-site.xml
. While doing the same myself, I found this and this tutorials especially helpful. I would suggest you to try with the very basic memory settings and optimize them later on. First check with a word count example then go on to other complex ones.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…