Extracting multiple dataframes from textfile using pattern and expression in R programming

Question

Welcome To Ask or Share your Answers For Others

Extracting multiple dataframes from textfile using pattern and expression in R programming

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Extracting multiple dataframes from textfile using pattern and expression in R programming

I have the following input file of XY1 series with many unnecessary lines in the beginning.

Input file:
Unnecessay lines...
Unnecessay lines...
Unnecessay lines...
...........
...........

!Time Step
XY1 3 3 0 0
11908800 5
11912400   200
13737600 200

!Discharge
XY1 1 8 0 0
11908800    1840.593294 
11995200    1840.593294 !Day spin-up
12081600    1840.593294 !Day of simulation
12168000    2831.681991 !Day to ramp up flow
12254400    2831.681991 !Day of Simulation
12340800    4247.522986 !Day to ramp up flow
12427200    4247.522986 !Day of simulation
12513600    4247.522986 !+ 1-hour

!DS tailwater
XY1 2 8 0 0
11908800    103.0224
11995200    103.0224
12081600    103.0224
12168000    103.05288
12254400    103.05288
12340800    103.08336
12427200    103.08336
12513600    103.08336

!DS tailwater2
XY1 3 8 0 0
119088  103.0224
119520  90.0224
120800  115.0224
121000  103.05288
122400  110.05288
123800  103.08336
124200  101.08336
125600  105.08336
!ENDXY1

END

There can be more XY1 series in the input file. I only want to get the dataframe below the XY1 series with "8" in the line. I have used grep("^XY1 d 8", at) but don't know how to use a loop.

Output df1 based on XY1 1 8 0 0:

Node        Value
11908800    1840.593294 
11995200    1840.593294
12081600    1840.593294 
12168000    2831.681991 
12254400    2831.681991 
12340800    4247.522986 
12427200    4247.522986 
12513600    4247.522986 

Output df2 based on XY1 2 8 0 0:
Node        Value
11908800    103.0224
11995200    103.0224
12081600    103.0224
12168000    103.05288
12254400    103.05288
12340800    103.08336
12427200    103.08336
12513600    103.08336 

Output df3 based on XY1 3 8 0 0:
Node    Value
119088  103.0224
119520  90.0224
120800  115.0224
121000  103.05288
122400  110.05288
123800  103.08336
124200  101.08336
125600  105.08336

Thank you so much for your help.
I can use something like this to get the lines

rm(list=ls(all=TRUE))
dat <- readLines("D:/Shuvashish/R_adh/AR_20base_201214.bc" )
a=grep("^XY1 \d 8", dat)
b=grep("^!ENDXY1", dat)

df1 <- read.delim( text=dat[(a[1]+1):(a[2]-2)],sep = "",header = FALSE)
df1

How can I automate this process in a for or while loop for all the XY1 series, data will be in different dataframes, i.e, df1,df2 df3 etc. Thanks. if I want to get the next XY1 series which is beneth XY1 2 8 0 0 with the following code:

df2 <- read.delim( text=dat[(a[2]+1):(a[3]-2)],sep = "",header = FALSE)

it throws an error coz there is no 3rd XY1 series in the text for there is no a[3], on the other hand, a[-1] only captures the start of last XY1 how to get the endline in that case? I just put !ENDXY1 after the last XY1 series that I can grab using grep:

b=grep("^!ENDXY1", data)

How can I write a forloop with conditions in thatcase if there are 100s of XY1 series. Highly appreciate your help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:16:08+0000

In base R you could do:

a <- paste0(readLines("try.txt"), collapse = "
")
b <- regmatches(a, gregexpr("(?s)XY1[^
]+8.*?
\K.*?
[!]", a, perl = TRUE))[[1]]
lapply(b, function(x) 
     read.table(text = gsub("(?m)[!].*", '', x, perl = TRUE), 
                col.names = c("Node", "Value")))
[[1]]
      Node    Value
1 11908800 1840.593
2 11995200 1840.593
3 12081600 1840.593
4 12168000 2831.682
5 12254400 2831.682
6 12340800 4247.523
7 12427200 4247.523
8 12513600 4247.523

[[2]]
      Node    Value
1 11908800 103.0224
2 11995200 103.0224
3 12081600 103.0224
4 12168000 103.0529
5 12254400 103.0529
6 12340800 103.0834
7 12427200 103.0834
8 12513600 103.0834

[[3]]
    Node    Value
1 119088 103.0224
2 119520  90.0224
3 120800 115.0224
4 121000 103.0529
5 122400 110.0529
6 123800 103.0834
7 124200 101.0834
8 125600 105.0834

Categories

Extracting multiple dataframes from textfile using pattern and expression in R programming

Extracting multiple dataframes from textfile using pattern and expression in R programming

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags