Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.3k views
in Technique[技术] by (71.8m points)

大文件如何通过 Mapreduce 切分存入 hbase

  • 大文件(几百M、几个G)通过 Mapreduce 切分成多个 5mb 的小文件并存入 hbase 这种操作如何实现啊
  • 为啥自定义Mapreduce的driver使用FileInputFormat作为输入总是按行切分,这个怎么搞
  • driver代码
        // 1 获取配置信息
 Configuration conf = new Configuration();
        // 2 设置信息
 conf.set("rowKeyName", String.valueOf(Long.MAX_VALUE - Calendar.getInstance().getTimeInMillis()));
        conf.set("columnFamilyName", "data");
        conf.set("columnPrefixName", "scar_");
        conf.set("hbase.zookeeper.quorum", "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181");
        // 3 获取job对象实例
 Job job = Job.getInstance(conf);
        // 4 指定本程序的jar包所在的本地路径
 job.setJarByClass(Hdfs2HbaseDriver.class);
        // 5 指定本业务job要使用的mapper/Reducer业务类
 job.setMapperClass(Hdfs2HbaseMapper.class);
 // 6 指定mapper输出数据的kv类型
 job.setMapOutputKeyClass(NullWritable.class);
 // 7 设置最小reducer任务数
 job.setNumReduceTasks(0);
 // 8 指定job的输入原始文件所在目录
 FileInputFormat.setMaxInputSplitSize(job, 3145728);
        FileInputFormat.setMinInputSplitSize(job, 3145728);
        FileInputFormat.setInputPaths(job, new Path(args[0]));
        // 9 指定最终输出的数据hbase表名和使用的reducer类
 TableMapReduceUtil.initTableReducerJob("object", null, job);
        TableMapReduceUtil.addDependencyJars(job);
        // 10 提交给yarn运行
 boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神解答

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...