Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
232 views
in Technique[技术] by (71.8m points)

eclipse - Using HBase importtsv tool to bulk load data from Java code

I am trying to bulk load csv file to hbase using importtsv and LoadIncrementalHFiles tools that ship with Apache HBase.

We can find the tutorials at these pages: cloudera, apache

I am using Apache hadoop and hbase.

Both sources explains how to use these tools through command prompt. However I want to get this done from Java code. I know I can write custom map reduce as explained on cloudera page. However I want know if I can use classes corresponding to these tools directly in my Java code.

My cluster is running on Ubuntu VM inside VMWare in pseudo distributed mode, whereas my Java code is running on Windows host machine. When doing it through the command prompt on the same machine running cluster, we run following commands:

$HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-1.2.1.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 -Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir datatsv  hdfs://192.168.23.128:9000/bulkloadinputdir/

As can be seen above we set HADOOP_CLASSPATH. In my case, I guess I have to copy all xyz-site.xml hadoop configuration files to my Windows machines and set the directory containing it as HADOOP_CLASSPATH environment variable. So I copy pasted core-site.xml, hbase-site.xml, hdfs-site.xml to my Windows machine, set the directory to Windows environment variable HADOOP_CLASSPATH. Apart from these I also added all the required JARs to eclipse project's build path.

But after running the project I got following error:

Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:319)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
    at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
    at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
    at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
    at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
    at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
    at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:403)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.createSubmittableJob(ImportTsv.java:493)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:737)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
    at HBaseImportTsvBulkLoader.createStoreFilesFromHdfsFiles(HBaseImportTsvBulkLoader.java:36)
    at HBaseImportTsvBulkLoader.main(HBaseImportTsvBulkLoader.java:17)

So somehow importtsv is still not able to find the location of the cluster.

This is how my basic code looks like:

1    import java.io.IOException;
2    
3    import org.apache.hadoop.conf.Configuration;
4    import org.apache.hadoop.hbase.mapreduce.ImportTsv;
5    import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
6    import org.apache.hadoop.conf.Configuration;
7    import org.apache.hadoop.fs.FileStatus;
8    import org.apache.hadoop.fs.FileSystem;
9    import org.apache.hadoop.fs.Path;
10    
11    public class HBaseImportTsvBulkLoader {
12      static Configuration config;
13        
14        public static void main(String[] args) throws Exception {
15          config = new Configuration();
16              copyFileToHDFS();
17          createStoreFilesFromHdfsFiles();
18          loadStoreFilesToTable();
19      }
20        
21        private static void copyFileToHDFS() throws IOException
22        {
23          config.set("fs.defaultFS","hdfs://192.168.23.128:9000"); //192.168.23.128       
24          FileSystem hdfs = FileSystem.get(config);
25          Path localfsSourceDir = new Path("D:\delete\bulkloadinputfile1");
26          Path hdfsTargetDir = new Path (hdfs.getWorkingDirectory() + "/");       
27          hdfs.copyFromLocalFile(localfsSourceDir, hdfsTargetDir);
28        }
29        
30        private static void createStoreFilesFromHdfsFiles() throws Exception
31        {
32          String[] _args = {"-Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir",
33                  "-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2",
34                  "datatsv",
35                  "hdfs://192.168.23.128:9000/bulkloadinputdir/"};    
36          ImportTsv.main(_args);                                 //**throws exception**
37          
38        }
39        
40        private static void loadStoreFilesToTable() throws Exception
41        {
42          String[] _args = {"hdfs://192.168.23.128:9000/hbasebulkloadoutputdir","datatsv"};
43          LoadIncrementalHFiles.main(_args);
44        }
45    }
46    

Questions

  1. Which all xyz-site.xml fiels are required?

  2. In what way should I be specifying HADOOP_CLASSPATH?

  3. Can I pass the required arguments to main() methods of ImportTsv such as -Dhbase.rootdirbelow:

    String[] _args = {"-Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir",
            "-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2",
            "-Dhbase.rootdir=hdfs://192.168.23.128:9000/hbase",
            "datatsv",
            "hdfs://192.168.23.128:9000/bulkloadinputdir/"};
    
  4. Can I use ImportTsv.setConf() to set the same?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...