Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
374 views
in Technique[技术] by (71.8m points)

r - Regex return file name, remove path and file extension

I have a data.frame that contains a text column of file names. I would like to return the file name without the path or the file extension. Typically, my file names have been numbered, but they don't have to be. For example:

df<-data.frame(data=c("a","b"),fileNames=c("C:/a/bb/ccc/NAME1.ext","C:/a/bb/ccc/d D2/name2.ext"))

I would like to return the equivalent of

df<-data.frame(data=c("a","b"),fileNames=c("NAME","name"))

but I cannot figure out the slick regular expression to do this with gsub. For example, I can get rid of the extension with (provided the file name ends with a number):

gsub('([0-9]).ext','',df[,"fileNames"])

Though I've been trying various patterns (by reading the regex help files and similar solutions on this site), I can't get a regex to return the text between the last "/" and the first ".". Any thoughts or forwards to similar questions are much appreciated!

The best I have gotten is:

 gsub('*[[:graph:]_]/|*[[:graph:]_].ext','',df[,"fileNames"])

But this 1) doesn't get rid of all the leading path characters and 2) is dependent on a specific file extension.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Perhaps this will get you closer to your solution:

library(tools)
basename(file_path_sans_ext(df$fileNames))
# [1] "NAME1" "name2"

The file_path_sans_ext function is from the "tools" package (which I believe usually comes with R), and that will extract the path up to (but not including) the extension. The basename function will then get rid of your path information.

Or, to take from file_path_sans_ext and modify it a bit, you can try:

sub("(.*\/)([^.]+)(\.[[:alnum:]]+$)", "\2", df$fileNames)
# [1] "NAME1" "name2"

Here, I've "captured" all three parts of the "fileNames" variables, so if you wanted just the file paths, you would change "\2" to "\1", and if you wanted just the file extensions, you would change it to "\3".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...