Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
644 views
in Technique[技术] by (71.8m points)

regex - Listing all files matching a full-path pattern in R

I am trying to obtain the list of files matching a full-path pattern. So far, I have used list.files() but it did not work.

Let's assume that we have the following directory organization:

results
   |- A
   |  |- data-1.csv
   |  |- data-2.csv
   |
   |- B
      |- data-1.csv
      |- data-2.csv

Then the following command:

list.files(pattern='data-.*\.csv', recursive=TRUE)

will return all the files matching the pattern. This works, but the problem appears when using a full-path pattern. For instance, if I want to obtain all the CSV files from directory results/A, I could do:

list.files(pattern='results/A/data-.*\.csv', recursive=TRUE)

This does not work, though. Somehow, it seems like R is not able to use a full-path pattern as a regular expression. In this case, the solution could be to just use results/A as the base path. But in more complex problems, that cannot be done. For instance, at some point we may want to match the subdirectories containing only characters:

list.files(pattern='results/[A-Z]+/data-.*\.csv', recursive=TRUE)

Is it possible to do this in R?

UPDATE: After using ad hoc solutions for a while, I decided to stop typing the same again and again. So, I created a library for simplifying this task.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

First, note that you are not using regular expression patterns. Your first example should be:

list.files(pattern='data-.*\.csv', recursive=TRUE)

Then, it seems the pattern matching inside list.files is applied to the file basenames (i.e., not including the directory path) so you could split the task into:

  1. Find all files matching the basename only, return their full paths:

    basename.matches <- list.files(pattern='data-.*\.csv', recursive=TRUE,
                                   full.names = TRUE)
    basename.matches
    # [1] "./results/A/data-1.csv" "./results/A/data-2.csv" "./results/B/data-1.csv"
    # [4] "./results/B/data-2.csv"
    
  2. Keep only those that match the expected directory(ies):

    full.matches <- grep(pattern='^\./results/A/', basename.matches, value = TRUE)
    full.matches
    # [1] "./results/A/data-1.csv" "./results/A/data-2.csv"
    

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...