Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
216 views
in Technique[技术] by (71.8m points)

Match two character strings by location in R

string <- paste(append(rep(" ", 7), append("A", append(rep(" ", 8), append("B", append(rep(" ", 17), "C"))))), collapse = "")
text <- paste(append(rep(" ", 7), append("I love", append(rep(" ", 3), append("chocolate", append(rep(" ", 9), "pudding"))))), collapse = "")

string
[1] "       A        B                 C"
text
[1] "       I love   chocolate         pudding"

I am trying to match letters in "string" with text in "text" such that to the letter A corresponds the text "I love" to B corresponds "chocolate" and to C "pudding". Ideally, I would like to put A, B, C in column 1 and three different rows of a dataframe (or tibble) and the text in column 2 and the corresponding rows. Any suggestion?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It is hard to know whether the strings in which you are trying to manipulate and then collate into columns in a data.frame follow a pattern. But for the example you posted, I suggest creating a list with the strings (strings):

strings <- list(string, text)

Then use lapply() which will in turn create a list for each element in strings.

res <-lapply(strings, function(x){
  grep(x=trimws(unlist(strsplit(x, "\s\s"))), pattern="[[:alpha:]]", value=TRUE)
})

In the code above, strsplit() splits the string whenever two spaces are found (\s\s). But the resulting split is a list with the strings as inner elements. Therefore you need to use unlist() so you can use it with grep(). grep() will select only those strings with an alphanumeric character --which is what you want.

You can then use do.call(cbind, list) to bind the elements in the resulting lapply() list into columns. The dimension must match for this work.

do.call(cbind, res)

Result:

> do.call(cbind, res)
     [,1] [,2]       
[1,] "A"  "I love"   
[2,] "B"  "chocolate"
[3,] "C"  "pudding"  

You can wrap it up into a as.data.frame() for instance to get the desired result:

> as.data.frame(do.call(cbind, res), stringsAsFactors = FALSE)
  V1        V2
1  A    I love
2  B chocolate
3  C   pudding

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...