python 2.7 - Quick implementation of character n-grams for word

Question

Welcome To Ask or Share your Answers For Others

python 2.7 - Quick implementation of character n-grams for word

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

python 2.7 - Quick implementation of character n-grams for word

I wrote the following code for computing character bigrams and the output is right below. My question is, how do I get an output that excludes the last character (ie t)? and is there a quicker and more efficient method for computing character n-grams?

b='student'
>>> y=[]
>>> for x in range(len(b)):
    n=b[x:x+2]
    y.append(n)
>>> y
['st', 'tu', 'ud', 'de', 'en', 'nt', 't']

Here is the result I would like to get:['st','tu','ud','de','nt]

Thanks in advance for your suggestions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:46:14+0000

To generate bigrams:

In [8]: b='student'

In [9]: [b[i:i+2] for i in range(len(b)-1)]
Out[9]: ['st', 'tu', 'ud', 'de', 'en', 'nt']

To generalize to a different n:

In [10]: n=4

In [11]: [b[i:i+n] for i in range(len(b)-n+1)]
Out[11]: ['stud', 'tude', 'uden', 'dent']

Categories

python 2.7 - Quick implementation of character n-grams for word

python 2.7 - Quick implementation of character n-grams for word

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags