Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
564 views
in Technique[技术] by (71.8m points)

regex - R Sort strings according to substring

I have a set of file names like:

filelist <- c("filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt")

and I would like to filter them according to the number after "-".

In python, for instance, I can use the keyparameter of the sorting function:

filelist <- ["filea-10.txt", "fileb-2.txt", "filec-1.txt", "filed-5.txt", "filef-4.txt"]
sorted(filelist, key=lambda(x): int(x.split("-")[1].split(".")[0]))

> ["filec-1.txt", "fileb-2.txt", "filef-4.txt", "filed-5.txt", "filea-10.txt"]

In R, I am playing with strsplit and lapply with no luck so far.

Which is the way to do it in R?

Edit: File names can be many things and may include more numbers. The only fixed pattern is that the number I want to sort by is after the "-". Another (real) example:

c <- ("boards10017-51.mp4",  "boards10065-66.mp4",  "boards10071-81.mp4",
      "boards10185-91.mp4", "boards10212-63.mp4",  "boards1025-51.mp4",   
      "boards1026-71.mp4",   "boards10309-89.mp4", "boards10310-68.mp4",  
      "boards10384-50.mp4",  "boards10398-77.mp4",  "boards10419-119.mp4", 
      "boards10421-85.mp4",  "boards10444-87.mp4",  "boards10451-60.mp4",  
      "boards10461-81.mp4",  "boards10463-52.mp4",  "boards10538-83.mp4",  
      "boards10575-62.mp4",  "boards10577-249.mp4")"
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I'm not sure of the actual complexity of your list of file names, but something like the following might be sufficient:

filelist[order(as.numeric(gsub("[^0-9]+", "", filelist)))]
# [1] "filec-1.txt"  "fileb-2.txt"  "filef-4.txt"  "filed-5.txt"  "filea-10.txt"

Considering your edit, you may want to change the gsub to something like:

gsub(".*-|\..*", "", filelist)

Again, without a few more text cases, it's hard to say whether this is sufficient for your needs.


Example:

 x <- c("boards10017-51.mp4", "boards10065-66.mp4", "boards10071-81.mp4", 
     "boards10185-91.mp4", "boards10212-63.mp4", "boards1025-51.mp4",     
     "boards1026-71.mp4", "boards10309-89.mp4", "boards10310-68.mp4",     
     "boards10384-50.mp4", "boards10398-77.mp4", "boards10419-119.mp4",   
     "boards10421-85.mp4", "boards10444-87.mp4", "boards10451-60.mp4",    
     "boards10461-81.mp4", "boards10463-52.mp4", "boards10538-83.mp4",    
     "boards10575-62.mp4", "boards10577-249.mp4")  

x[order(as.numeric(gsub(".*-|\..*", "", x)))]
##  [1] "boards10384-50.mp4"  "boards10017-51.mp4"  "boards1025-51.mp4"  
##  [4] "boards10463-52.mp4"  "boards10451-60.mp4"  "boards10575-62.mp4" 
##  [7] "boards10212-63.mp4"  "boards10065-66.mp4"  "boards10310-68.mp4" 
## [10] "boards1026-71.mp4"   "boards10398-77.mp4"  "boards10071-81.mp4" 
## [13] "boards10461-81.mp4"  "boards10538-83.mp4"  "boards10421-85.mp4" 
## [16] "boards10444-87.mp4"  "boards10309-89.mp4"  "boards10185-91.mp4" 
## [19] "boards10419-119.mp4" "boards10577-249.mp4" 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...