eclipse - Using HBase importtsv tool to bulk load data from Java code

Question

Welcome To Ask or Share your Answers For Others

eclipse - Using HBase importtsv tool to bulk load data from Java code

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

eclipse - Using HBase importtsv tool to bulk load data from Java code

I am trying to bulk load csv file to hbase using importtsv and LoadIncrementalHFiles tools that ship with Apache HBase.

We can find the tutorials at these pages: cloudera, apache

I am using Apache hadoop and hbase.

Both sources explains how to use these tools through command prompt. However I want to get this done from Java code. I know I can write custom map reduce as explained on cloudera page. However I want know if I can use classes corresponding to these tools directly in my Java code.

My cluster is running on Ubuntu VM inside VMWare in pseudo distributed mode, whereas my Java code is running on Windows host machine. When doing it through the command prompt on the same machine running cluster, we run following commands:

$HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-1.2.1.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2 -Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir datatsv  hdfs://192.168.23.128:9000/bulkloadinputdir/

As can be seen above we set HADOOP_CLASSPATH. In my case, I guess I have to copy all xyz-site.xml hadoop configuration files to my Windows machines and set the directory containing it as HADOOP_CLASSPATH environment variable. So I copy pasted core-site.xml, hbase-site.xml, hdfs-site.xml to my Windows machine, set the directory to Windows environment variable HADOOP_CLASSPATH. Apart from these I also added all the required JARs to eclipse project's build path.

But after running the project I got following error:

Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:319)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
    at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
    at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
    at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
    at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
    at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:161)
    at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
    at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
    at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
    at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:403)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.createSubmittableJob(ImportTsv.java:493)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:737)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:747)
    at HBaseImportTsvBulkLoader.createStoreFilesFromHdfsFiles(HBaseImportTsvBulkLoader.java:36)
    at HBaseImportTsvBulkLoader.main(HBaseImportTsvBulkLoader.java:17)

So somehow importtsv is still not able to find the location of the cluster.

This is how my basic code looks like:

1    import java.io.IOException;
2    
3    import org.apache.hadoop.conf.Configuration;
4    import org.apache.hadoop.hbase.mapreduce.ImportTsv;
5    import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles;
6    import org.apache.hadoop.conf.Configuration;
7    import org.apache.hadoop.fs.FileStatus;
8    import org.apache.hadoop.fs.FileSystem;
9    import org.apache.hadoop.fs.Path;
10    
11    public class HBaseImportTsvBulkLoader {
12      static Configuration config;
13        
14        public static void main(String[] args) throws Exception {
15          config = new Configuration();
16              copyFileToHDFS();
17          createStoreFilesFromHdfsFiles();
18          loadStoreFilesToTable();
19      }
20        
21        private static void copyFileToHDFS() throws IOException
22        {
23          config.set("fs.defaultFS","hdfs://192.168.23.128:9000"); //192.168.23.128       
24          FileSystem hdfs = FileSystem.get(config);
25          Path localfsSourceDir = new Path("D:\delete\bulkloadinputfile1");
26          Path hdfsTargetDir = new Path (hdfs.getWorkingDirectory() + "/");       
27          hdfs.copyFromLocalFile(localfsSourceDir, hdfsTargetDir);
28        }
29        
30        private static void createStoreFilesFromHdfsFiles() throws Exception
31        {
32          String[] _args = {"-Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir",
33                  "-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2",
34                  "datatsv",
35                  "hdfs://192.168.23.128:9000/bulkloadinputdir/"};    
36          ImportTsv.main(_args);                                 //**throws exception**
37          
38        }
39        
40        private static void loadStoreFilesToTable() throws Exception
41        {
42          String[] _args = {"hdfs://192.168.23.128:9000/hbasebulkloadoutputdir","datatsv"};
43          LoadIncrementalHFiles.main(_args);
44        }
45    }
46

Questions

Which all xyz-site.xml fiels are required?
In what way should I be specifying HADOOP_CLASSPATH?

Can I pass the required arguments to main() methods of ImportTsv such as -Dhbase.rootdirbelow:

String[] _args = {"-Dimporttsv.bulk.output=hdfs://192.168.23.128:9000/bulkloadoutputdir",
        "-Dimporttsv.columns=HBASE_ROW_KEY,d:c1,d:c2",
        "-Dhbase.rootdir=hdfs://192.168.23.128:9000/hbase",
        "datatsv",
        "hdfs://192.168.23.128:9000/bulkloadinputdir/"};

Can I use ImportTsv.setConf() to set the same?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

eclipse - Using HBase importtsv tool to bulk load data from Java code

eclipse - Using HBase importtsv tool to bulk load data from Java code

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags