Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
589 views
in Technique[技术] by (71.8m points)

hadoop - Oozie + Sqoop: JDBC Driver Jar Location

I have a 6 node cloudera based hadoop cluster and I'm trying to connect to an oracle database from a sqoop action in oozie.

I have copied my ojdbc6.jar into the sqoop lib location (which for me happens to be at: /opt/cloudera/parcels/CDH-4.2.0-1.cdh4.2.0.p0.10/lib/sqoop/lib/ ) on all the nodes and have verified that I can run a simple 'sqoop eval' from all the 6 nodes.

Now when I run the same command using Oozie's sqoop action, I get "Could not load db driver class: oracle.jdbc.OracleDriver"

I have read this article about using shared libs and it makes sense to me when we're talking about my task/action/workflow specific dependencies. But I see a JDBC driver installation as an extention to sqoop and so I think it belongs in the sqoop installation lib.

Now the question is, while sqoop sees this ojdbc6 jar I have put into it's lib folder, how come my Oozie workflow doesn't see it?

Is this something expected or am I missing something?

As an aside, what do you guy think about where is the appropriate location for a JDBC driver jar?

Thanks in advance!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The JDBC driver jar (and any jars it depends on) should go in your Oozie sharelib folder on HDFS. I'm running Hortonworks Data Platform 1.2 instead of Cloudera 4.2 so the details may vary, but my JDBC driver is located in /user/oozie/share/lib/sqoop. This should allow you to run Sqoop with the JDBC via Oozie.

It is not necessary to put to the JDBC driver jar in the sqoop lib on the data nodes. In my setupt I can't run a simple sqoop eval from the command line on my data nodes. I understand the logic for why you thought this would work. The reason the JDBC driver jar needs to be on HDFS is so that all the data nodes have access to it. Your solution should accomplish the same goal. I'm not familiar enough with the inner workings of Oozie to say why using the sharelib works but your solution does not.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...