Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
657 views
in Technique[技术] by (71.8m points)

regex - Extract capture group matches from regular expressions? (or: where is gregexec?)

Given a regular expression containing capture groups (parentheses) and a string, how can I obtain all the substrings matching the capture groups, i.e., the substrings usually referenced by "1", "2"?

Example: consider a regex capturing digits preceded by "xy":

s <- "xy1234wz98xy567"

r <- "xy(\d+)"

Desired result:

[1] "1234" "567" 

First attempt: gregexpr:

regmatches(s,gregexpr(r,s))
#[[1]]
#[1] "xy1234" "xy567" 

Not what I want because it returns the substrings matching the entire pattern.

Second try: regexec:

regmatches(s,regexec("xy(\d+)",s))
#[[1]]
#[1] "xy1234" "1234" 

Not what I want because it returns only the first occurence of a matching for the entire pattern and the capture group.

If there was a gregexec function, extending regexec as gregexpr extends regexpr, my problem would be solved.

So the question is: how to retrieve all substrings (or indices that can be passed to regmatches as in the examples above) matching capture groups in an arbitrary regular expression?

Note: the pattern for r given above is just a silly example, it must remain arbitrary.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For a base R solution, what about just using gsub() to finish processing the strings extracted by gregexpr() and regmatches()?

s <- "xy1234wz98xy567"
r <- "xy(\d+)"

gsub(r, "\1", regmatches(s,gregexpr(r,s))[[1]])
# [1] "1234" "567" 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...