Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.6k views
in Technique[技术] by (71.8m points)

regex - How to delete everything after nth delimiter in R?

I have this vector myvec. I want to remove everything after second ':' and get the result. How do I remove the string after nth ':'?

myvec<- c("chr2:213403244:213403244:G:T:snp","chr7:55240586:55240586:T:G:snp" ,"chr7:55241607:55241607:C:G:snp")

result
chr2:213403244   
chr7:55240586
chr7:55241607
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We can use sub. We match one or more characters that are not : from the start of the string (^([^:]+) followed by a :, followed by one more characters not a : ([^:]+), place it in a capture group i.e. within the parentheses. We replace by the capture group (\1) in the replacement.

sub('^([^:]+:[^:]+).*', '\1', myvec)
#[1] "chr2:213403244" "chr7:55240586"  "chr7:55241607" 

The above works for the example posted. For general cases to remove after the nth delimiter,

n <- 2
pat <- paste0('^([^:]+(?::[^:]+){',n-1,'}).*')
sub(pat, '\1', myvec)
#[1] "chr2:213403244" "chr7:55240586"  "chr7:55241607" 

Checking with a different 'n'

n <- 3

and repeating the same steps

sub(pat, '\1', myvec)
#[1] "chr2:213403244:213403244" "chr7:55240586:55240586"  
#[3] "chr7:55241607:55241607"  

Or another option would be to split by : and then paste the n number of components together.

n <- 2
vapply(strsplit(myvec, ':'), function(x)
            paste(x[seq.int(n)], collapse=':'), character(1L))
#[1] "chr2:213403244" "chr7:55240586"  "chr7:55241607" 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...