read.table - Skip over all lines in a data file before and including a regular string in a loop in R

Question

Welcome To Ask or Share your Answers For Others

read.table - Skip over all lines in a data file before and including a regular string in a loop in R

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

read.table - Skip over all lines in a data file before and including a regular string in a loop in R

I have an instrument that produces data files which contain a large amount of header information. I want to read many files in at a time and rbind them together. To read these in successfully I have been using the following loop and skip to deal with the header information:

df <- c()
for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
u <-read.table(x, skip=100)
      df <- rbind(df, u)
}

Here is an example of what the datafile with 5 lines to skip looks like:

# Header information
# Header information
# Header information
# Header information
# Header information
*END*
      0.571    26.6331     8.2733    103.145     0.0842  -0.000049  0.000e+00
      0.576    26.6316     8.2756    103.171     0.3601  -0.000049  0.000e+00
      0.574    26.6322     8.2744    103.157     0.3613  -0.000046  0.000e+00

The issue is that the number of lines to skip is dynamic and I would like to come up with a generalized solution. Fortunately, every file ends with this:

*END*

So my question is, how can I read in a file with the above that skips over all lines before and includes the *END* line? This likely would take place before rbind-ing them together.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:13:28+0000

Read the input line by line using

all_content = readLines("input.txt")
>all_content
[1] "# Header information"                                                         
[2] "# Header information"                                                         
[3] "# Header information"                                                         
[4] "# Header information"                                                         
[5] "# Header information"                                                         
[6] "*END*"                                                                        
[7] "      0.571    26.6331     8.2733    103.145     0.0842  -0.000049  0.000e+00"
[8] "      0.576    26.6316     8.2756    103.171     0.3601  -0.000049  0.000e+00"
[9] "      0.574    26.6322     8.2744    103.157     0.3613  -0.000046  0.000e+00"

And remove the lines till you hit *END* using grep as follow

skip = all_content[-c(1:grep("*END*",all_content))]

Now read using the normal read.table function as follow

input <- read.table(textConnection(skip))
> input
     V1      V2     V3      V4     V5       V6 V7
1 0.571 26.6331 8.2733 103.145 0.0842 -4.9e-05  0
2 0.576 26.6316 8.2756 103.171 0.3601 -4.9e-05  0
3 0.574 26.6322 8.2744 103.157 0.3613 -4.6e-05  0

You get the desired result.

UPDATE

In your loop just use

for (x in list.files(pattern="*.cnv", recursive=TRUE)) {
   all_content <- readLines(x)
   skip = all_content[-c(1:grep("*END*",all_content))]
   input <- read.table(textConnection(skip))
   df <- rbind(df, input)
}

Categories

read.table - Skip over all lines in a data file before and including a regular string in a loop in R

read.table - Skip over all lines in a data file before and including a regular string in a loop in R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags