Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
365 views
in Technique[技术] by (71.8m points)

r grep by regex - finding a string that contains a sub string exactly one once

I am using R in Ubuntu, and trying to go over list of files, some of them i need and some of them i don't need,

I try to get the one's i need by finding a sub string in them, that need to appear exactly once,

i am using the function grep, that i found here grep function in r

and using the regex rules that i found here regex rules

and when taking the simple example

a <- c("a","aa") 
grep("a{1}", a) 

i would expect to get only the strings that contain "a" exactly one time, and instead of it i get both of them.

when i use the 2 instead of 1, i do get the wanted result of one strings (the one that contains "aa")

i can't use $ because this is not the end of the word for the words i need, for example i need to take those two words "germ-pass.tab", "germ-pass_germ-pass.tab" and return only the first that contains "germ-pass" once and once only

i cant use ^a because i don't need words such as "aca"

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

As I said in comments, grep looks for a pattern inside your string and there is indeed "a" (or "a{1}", which is the same for grep) in "aa". You need to add to the pattern that the "a" is followed by not a : "a[^a]":

grep("a[^a]", c("aa", "ab"), value=TRUE)
#[1] "ab"

EDIT

Considering your specific problem, it seems you can try by the "opposite" : filter out the strings that contains more than one occurence of the pattern, using a "capture" of the pattern:

!grepl("(ab).+\1", c("ab.t", "ab-ab.t"))
#[1]  TRUE FALSE

!grepl("(ab).*\1", c("ab", "ab-ab","ab-cc-ab", "abab"))
#[1]  TRUE FALSE FALSE FALSE

The brackets permit to capture the pattern (here ab but it can be any regex), the .* is for "anything" zero or more times and the \1 asks for a repeat of the captured pattern


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...