In java I would use:
MultipleInputs.addInputPath(conf, path, inputFormatClass, mapperClass)
to add multiple inputs with a different mapper for each.
Now I am using python to write a streaming job in hadoop, can a similiar job be done?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…