Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
655 views
in Technique[技术] by (71.8m points)

r - Find all possible substrings of length n

I have an interesting (only for me, perhaps, :)) question. I have text like:

"abbba"

The question is to find all possible substrings of length n in this string. For example, if n = 2, the substrings are

'ab','bb','ba'

and if n = 3, the substrings are

'abb','bbb','bba'

I thought to use something like this:

x <- 'abbba'
m <- matrix(strsplit(x, '')[[1]], nrow=2)
apply(m, 2, paste, collapse='')

But I got a warning and it doesn't work for len = 3.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

We may use

x <- "abbba"
allsubstr <- function(x, n) unique(substring(x, 1:(nchar(x) - n + 1), n:nchar(x)))
allsubstr(x, 2)
# [1] "ab" "bb" "ba"
allsubstr(x, 3)
# [1] "abb" "bbb" "bba"

where substring extracts a substring from x starting and ending at specified positions. We exploit the fact that substring is vectorized and pass 1:(nchar(x) - n + 1) as starting positions and n:nchar(x) as ending positions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...