Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

oracle - Getting an external table's location and file name?

I'm processing multiple files as part of an external table. Is there any way I can get the file name being processed in external tables and place it in the table?

At the moment the only solution I can find is appending the file name to every record in the flat file which isn't ideal from an efficiency standpoint and involves modifying the original data. Surely external tables know what file is being processed at any time?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm not aware of any way to capture the file name within the access parameters. As a workaround, rather than modifying the original files you could use a preprocessor to append the file name on the fly. If you had two files, say file_1.csv containing a,b,1 and file_2.csv containing c,d,2, you could have a small shell script like append_filename.sh:

#!/bin/bash
while read line
do
  printf "%s,%s
" "${line}" "${1##*/}"
done < $1

which you can verify does something helpful by calling the script directly:

$ ./append_filename.sh file_1.csv
a,b,1,file_1.csv

You can then define your external table to call that via the preprocessor clause, something like:

create table e42 (
  col1 varchar2(10),
  col2 varchar2(10),
  col3 number,
  filename varchar2(30)
)
organization external (
  type oracle_loader
  default directory d42
  access parameters (
    records delimited by newline
    preprocessor 'append_filename.sh'
    fields terminated by ','
  )
  location ('file_1.csv', 'file_2.csv')
);

Table E42 created.

Then the file name is picked up automatically:

select * from e42;

COL1       COL2             COL3 FILENAME                     
---------- ---------- ---------- ------------------------------
a          b                   1 file_1.csv                    
c          d                   2 file_2.csv                    

I've stripped the directory path so you only see the file name - you could keep the full path if you prefer, but that may not be necessary and could reveal OS details to people who can only query the table. Note the security guidelines; I've kept it simple here by using one directory for everything, but you should put the preprocessor somewhere else. And of course this is assuming a Unix-y platform or GNU tools; something similar should be possible with a batch file if you're using Windows.


This approach reading line-by-line will be relatively slow for large files; with a 1.5 million-row test file appending the file name took about 80 seconds on my platform. Other built-in tools will be faster; this version with sed takes just over a second for the same file:

#!/bin/bash
sed -e 's!$!,'"${1##*/}"'!' $1

You could try other alternative such as awk too; you'd probably need to test a few to see what works best (or fast enough) in your environment.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...